使用textacy提取引语



我试图从文本中提取引文和引文归因(即说话者),但我得到错误。下面是设置:

import textacy
import pandas as pd
import spacy
data = [
(""Hello, nice to meet you," said world 1"),
(""Hello, nice to meet you," said world 2"),  
]
df = pd.DataFrame(data, columns=['text'])
nlp = spacy.load('en_core_web_sm')
doc = df['text'].apply(nlp)

下面是期望的输出:

[DQTriple(speaker=[world 1], cue=[said], content="Hello, nice to meet you,"] [DQTriple(speaker=[world 2], cue=[said], content="Hello, nice to meet you,"]

这是第一次尝试提取:

print(list(textacy.extract.triples.direct_quotations(doc) for records in doc))

输出如下:

[]

这是第二次尝试提取:

print(list(textacy.extract.triples.direct_quotations(doc)))

给出如下错误:

AttributeError: 'Series'对象没有属性'lang_'

在您的第一次尝试中,您是通过遍历标记来提取报价的。

你可以这样做:

import textacy
import spacy
text =""" "Hello, nice to meet you," said world 1"""
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
print(list(textacy.extract.triples.direct_quotations(doc)))
# will print: [DQTriple(speaker=[world], cue=[said], content="Hello, nice to meet you,")]

你必须使用

next(textacy.extract.triples.direct_quotations(doc)) 

因为它是一个生成器对象。

相关内容

  • 没有找到相关文章

最新更新