从nltk.stanford.Stanford DipperencyCyparser插入list_iterator,内部



我正在尝试与pandas dataframe内部的stanfordDepentencyCyparser一起工作。

from nltk.parse import stanford
import pandas as pd
dep_parser=stanford.StanfordDependencyParser()
df = pd.DataFrame({'ID' : [0,1,2], 'sentence' : ['This is the first s.', 'This is the 2nd s.', 'This isn''t the third s.']})
df['parsed'] = df.sentence.apply(dep_parser.raw_parse)
print(df)
   ID                sentence                                        parsed
0   0    This is the first s.  <list_iterator object at 0x000000000E849C18>
1   1      This is the 2nd s.  <list_iterator object at 0x000000000E8691D0>
2   2  This isnt the third s.  <list_iterator object at 0x000000000E8696A0>

,但我想像迭代列内部的依赖关系图的文本表示形式,而不是迭代器,例如:

    ID                sentence                                        parsed
0   0    This is the first s.  [[(('s.', 'NN'), 'nsubj', ('This', 'DT')),(('s.', 'NN'), 'cop', ('is', 'VBZ')), (('s.', 'NN'), 'det', ('the', 'DT')),(('s.', 'NN'), 'amod', ('first', 'JJ'))]]
                   ...

我试图通过在熊猫的步骤中工作来遵循NLTK文档,但会导致属性错误:

 df['dep'] = [list(parse.triples()) for parse in df.parsed]
 AttributeError: 'list_iterator' object has no attribute 'triples'

是否有一种方法可以解开在数据框中显示为值的迭代器?欢迎任何帮助。

a list_iterator是生产列表"按需"的机制。它确实没有方法triples(),但是它在您的情况下产生的列表确实是三元组列表:

df['dep'] = [list(parse) for parse in df['parsed']]

最新更新