NP-chunker value error (Python nltk)

我正在基于Python NLTK书（第7章）构建NLP-Pipeline。代码的第一段正确地预处理数据，但是我无法通过我的NP-Chunker运行其输出：

import nltk, re, pprint
#Import Data
data = 'This is a test sentence to check if preprocessing works' 
#Preprocessing
def preprocess(document):
    sentences = nltk.sent_tokenize(document)
    sentences = [nltk.word_tokenize(sent) for sent in sentences] 
    sentences = [nltk.pos_tag(sent) for sent in sentences]
    return(sentences)
tagged = preprocess(data)
print(tagged)
#regular expression-based NP chunker
grammar = "NP: {<DT>?<JJ>*<NN>}"
cp = nltk.RegexpParser(grammar) #chunk parser
chunked = []
for s in tagged:
    chunked.append(cp.parse(tagged))
print(chunked)

这是我得到的追溯：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:Usersu0084411AppDataLocalContinuumAnaconda3libsite-packagesspyderutilssitesitecustomize.py", line 866, in runfile
    execfile(filename, namespace)
  File "C:Usersu0084411AppDataLocalContinuumAnaconda3libsite-packagesspyderutilssitesitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)
  File "C:/Users/u0084411/Box Sync/Procesmanager DH/Text Mining/Tools/NLP_pipeline.py", line 24, in <module>
    chunked.append(cp.parse(tagged))
  File "C:Usersu0084411AppDataLocalContinuumAnaconda3libsite-packagesnltkchunkregexp.py", line 1202, in parse
    chunk_struct = parser.parse(chunk_struct, trace=trace)
  File "C:Usersu0084411AppDataLocalContinuumAnaconda3libsite-packagesnltkchunkregexp.py", line 1017, in parse
    chunkstr = ChunkString(chunk_struct)
  File "C:Usersu0084411AppDataLocalContinuumAnaconda3libsite-packagesnltkchunkregexp.py", line 95, in __init__
    tags = [self._tag(tok) for tok in self._pieces]
  File "C:Usersu0084411AppDataLocalContinuumAnaconda3libsite-packagesnltkchunkregexp.py", line 95, in <listcomp>
    tags = [self._tag(tok) for tok in self._pieces]
  File "C:Usersu0084411AppDataLocalContinuumAnaconda3libsite-packagesnltkchunkregexp.py", line 105, in _tag
    raise ValueError('chunk structures must contain tagged '
ValueError: chunk structures must contain tagged tokens or trees
>>>

我的错误是什么？"标记"已被标记化，那么为什么该程序不识别这一点？

非常感谢！汤姆

看到此时，您会拍打额头。而不是这个

for s in tagged:
    chunked.append(cp.parse(tagged))

应该是这样的：

for s in tagged:
    chunked.append(cp.parse(s))

您遇到了错误，因为您没有传递cp.parse()标记句子，而是列表。

相关内容

最新更新

热门标签：