我试图从文本中提取引文和引文归因(即说话者),但我得到错误。下面是设置:
import textacy
import pandas as pd
import spacy
data = [
(""Hello, nice to meet you," said world 1"),
(""Hello, nice to meet you," said world 2"),
]
df = pd.DataFrame(data, columns=['text'])
nlp = spacy.load('en_core_web_sm')
doc = df['text'].apply(nlp)
下面是期望的输出:
[DQTriple(speaker=[world 1], cue=[said], content="Hello, nice to meet you,"] [DQTriple(speaker=[world 2], cue=[said], content="Hello, nice to meet you,"]
这是第一次尝试提取:
print(list(textacy.extract.triples.direct_quotations(doc) for records in doc))
输出如下:
[
, ]
这是第二次尝试提取:
print(list(textacy.extract.triples.direct_quotations(doc)))
给出如下错误:
AttributeError: 'Series'对象没有属性'lang_'
在您的第一次尝试中,您是通过遍历标记来提取报价的。
你可以这样做:
import textacy
import spacy
text =""" "Hello, nice to meet you," said world 1"""
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
print(list(textacy.extract.triples.direct_quotations(doc)))
# will print: [DQTriple(speaker=[world], cue=[said], content="Hello, nice to meet you,")]
你必须使用
next(textacy.extract.triples.direct_quotations(doc))
因为它是一个生成器对象。