如何将令牌与 Spacy 中的句子链接



我想从令牌构建一个关键字列表,并查找它们来自的句子,谢谢

您可以从 token.doc.sents 中获取句子,然后找到在您的令牌上或之后开始的第一个句子。您可以通过向token添加扩展属性来使其更方便,如下所示:

import spacy
from spacy.tokens import Token
def get_sentence(token):
    for sent in token.doc.sents:
        if sent.start <= token.i:
            return sent
# Add a computed property, which will be accessible as token._.sent
Token.set_extension('sent', getter=get_sentence)
nlp = spacy.load('en_core_web_sm')
doc = nlp(u'Sentence one. Sentence two.')
print(list(doc.sents))
print(doc[0]._.sent)
print(doc[-1]._.sent)

最新更新