如何获得消除重复的OpenIE(子句提取)结果



我已经用尽了我所知道的所有配置选项:

from openie import StanfordOpenIE

# https://stanfordnlp.github.io/CoreNLP/openie.html#api
# Default value of openie.affinity_probability_cap was 1/3.
properties = {
"annotators":"tokenize,ssplit,pos,depparse,natlog,openie",
'openie.affinity_probability_cap': 2 / 3,
"openie.triple.strict":"true",
'openie.max_entailments_per_clause': 1,
'splitter.disable': True
}
with StanfordOpenIE(properties=properties) as client:
text = 'Barack Obama was born in Hawaii. Richard Manning wrote this sentence.'
print('Text: %s.' % text)
for triple in client.annotate(text): #, max_entailments_per_clause=True):
print('|-', triple)

但结果仍然包含未合并的重复变体:

|- {'subject': 'Barack Obama', 'relation': 'was', 'object': 'born'}
|- {'subject': 'Barack Obama', 'relation': 'was born in', 'object': 'Hawaii'}

而我只是在寻找最大子句提取结果:

|- {'subject': 'Barack Obama', 'relation': 'was born in', 'object': 'Hawaii'}

有人能帮我一下吗?

这段代码对我有效。

from pycorenlp import *
import json
import nltk
nlp = StanfordCoreNLP("http://localhost:9000/")
text = 'Barack Obama was born in Hawaii. Richard Manning wrote this sentence.'
props = {"annotators": "tokenize,ssplit,pos,depparse,natlog,openie",
"outputFormat": "json",
"openie.triple.strict": "true",
"openie.max_entailments_per_clause": "1"}
sentences = nltk.sent_tokenize(text)
for sent in sentences:
print(sent)
output = nlp.annotate(sent, properties=props)
j_data = json.loads(output)
openie = j_data['sentences'][0]['openie']
for i in openie:
for rel in i:
relationSen = i['subject'], i['relation'], i['object']
print(relationSen)

它会产生以下输出。。。

Barack Obama was born in Hawaii.
('Barack Obama', 'was born in', 'Hawaii')
Richard Manning wrote this sentence.
('Richard Manning', 'wrote', 'sentence')

最新更新