如何使用NLTK只打印分块的字符串结果



我正在使用NLTK和RegEx来分析我的文本。该模型正确地识别了我定义的区块,但最终,所有标记的单词和"My_chunk"都会显示在打印结果中。问题是如何只打印文本的分块部分("My_Chunk"(?

以下是我的代码示例:

import re
import nltk
text = ['The absolutely kind professor asked students out whom he met in class']
for item in text:
tokenized = nltk.word_tokenize(item)
tagged = nltk.pos_tag(tokenized)
chunk = r"""My_Chunk: {<RB.?>*<NN.?>*<VBD.?>}"""
chunkParser = nltk.RegexpParser(chunk)
chunked = chunkParser.parse(tagged)
print(chunked)
chunked.draw()

打印结果为:

(S
The/DT
(My_Chunk absolutely/RB kind/NN professor/NN asked/VBD)
students/NNS
out/RP
whom/WP
he/PRP
(Chunk met/VBD)
in/IN
class/NN)

这应该做到:

for a in chunked:
if isinstance(a, nltk.tree.Tree):
if a.label() == "My_Chunk":
print(a)
print(" ".join([lf[0] for lf in a.leaves()]))
print()
#(My_Chunk absolutely/RB kind/NN professor/NN asked/VBD)
#absolutely kind professor asked
#(My_Chunk met/VBD)
#met

最新更新