在名词短语列表中解析NLTK树输出



我有一个句子

text  = '''If you're in construction or need to pass fire inspection, or just want fire resistant materials for peace of mind, this is the one to use. Check out 3rd party sellers as well Skylite'''

我在其上应用了NLTK块,并将树作为输出。

sentences = nltk.sent_tokenize(d)
sentences = [nltk.word_tokenize(sent) for sent in sentences]
sentences = [nltk.pos_tag(sent) for sent in sentences]
grammar = """NP: {<DT>?<JJ>*<NN.*>+}
       RELATION: {<V.*>}
                 {<DT>?<JJ>*<NN.*>+}
       ENTITY: {<NN.*>}"""
cp = nltk.RegexpParser(grammar)
for i in sentences:
    result = cp.parse(i)
    print(result)
    print(type(result))
    result.draw() 

输出如下:

(S If/IN you/PRP (RELATION 're/VBP) in/IN (NP construction/NN) or/CC (NP need/NN) to/TO (RELATION pass/VB) (NP fire/NN inspection/NN) ,/, or/CC just/RB (RELATION want/VB) (NP fire/NN) (NP resistant/JJ materials/NNS) for/IN (NP peace/NN) of/IN (NP mind/NN) ,/, this/DT (RELATION is/VBZ) (NP the/DT one/NN) to/TO (RELATION use/VB) ./.)

我如何以字符串列表的格式获得名词短语:

[construction, need, fire inspection, fire, resistant materials, peace, mind, the one]

请一些建议?

类似的东西:

noun_phrases_list = [[' '.join(leaf[0] for leaf in tree.leaves()) 
                      for tree in cp.parse(sent).subtrees() 
                      if tree.label()=='NP'] 
                      for sent in sentences]
#[['construction', 'need', 'fire inspection', 'fire', 'resistant materials', 
#  'peace', 'mind', 'the one'], 
# ['party sellers', 'Skylite']]

可以在下面的子树上使用过滤器

grammar = "NP: {<DT>?<JJ>*<NN>}"
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentences[1])
result.subtrees(filter =lambda t: t.label() == 'NP') # gives you generator

最新更新