NeuralCoref中的每个句子与哪些共指链有关

我正在使用neuralRef来执行文本中的共引用解析任务。

我想知道每个句子都提到了哪些共指聚类。例如，句子1有来自共指聚类1和4的提及；句子2有来自共指聚类10、14的提及。

我该怎么做？

您可以尝试遍历每个句子中的单词，并填充一个句子词典->集群，如果该单词是集群的一部分。不过，它假设跨度是一个单词，您可以尝试将其扩展到多个单词(双格或三格(，以防您想要处理键是多单词的集群。

import spacy
import neuralcoref
nlp = spacy.load('en')
neuralcoref.add_to_pipe(nlp)
doc = nlp('Angela lives in Boston. She is Happy. Nikki is her new friend. She is jolly too.')
print('*** cluster : tokens mapping ***')
print(doc._.coref_clusters)
mapping = {}
for sent in doc.sents:
mapping[sent] = set()
for idx in range(1, len(sent)):
span = sent[idx-1:idx]    # edit this to handle n-grams
if span._.is_coref:        
key = span._.coref_cluster.main               
mapping[sent].add(key) 


print('*** sentence : clusters mapping ***')
print(mapping)

输出如下：

*** cluster : tokens mapping ***
[Angela: [Angela, She, her], Nikki: [Nikki, She]]
*** sentence : clusters mapping ***
{Angela lives in Boston.: {Angela}, She is Happy.: {Angela}, Nikki is her new friend.: {Nikki, Angela}, She is jolly too.: {Nikki}}

相关内容

最新更新

热门标签：