创建默认标记器 Python NLTK

我正在尝试使用 NLTK 在 python 上创建一个默认标记器，但我不断收到错误。语料库由爱沙尼亚语单词组成，重点是标记每个单词的词性。

我的代码：

from nltk.corpus.reader import TaggedCorpusReader
mypath = "/Users/mmo/Downloads/"
EC = TaggedCorpusReader(mypath,"estonianSmall_copy.txt",
 encoding="latin-1")
sents = EC.tagged_sents()

from nltk import DefaultTagger
from nltk.probability import FreqDist
tags =[ [(word,tag)for word,tag in sent]
    for sent in EC.tagged_sents()]
tagF = FreqDist(tags)

错误：

tagF = FreqDist(tags)
Traceback (most recent call last):
   File "<ipython-input-26-c1ca76857fce>", line 1, in <module>
    tagF = FreqDist(tags)
  File "/Users/mmo/anaconda/lib/python2.7/site-packages/nltk/probability.py", line 106, in __init__
    Counter.__init__(self, samples)
  File "/Users/mmo/anaconda/lib/python2.7/collections.py", line 477, in __init__
    self.update(*args, **kwds)
  File "/Users/mmo/anaconda/lib/python2.7/collections.py", line 567, 
in update
    self[elem] = self_get(elem, 0) + 1
TypeError: unhashable type: 'list'

您的问题出在FreqDist上 - 您尚未创建默认标记器。由于您只是在尝试计算标签，因此请像这样将标签提供给FreqDist：

tagF = FreqDist(tag for word, tag in EC.tagged_words())

(请注意，tagged_words()返回平面序列，而不是列表列表。然后，您可以继续学习 nltk 教程来构建默认标记器。

相关内容

最新更新

热门标签：