根据每个句子检查字典中是否存在两个列表元素?



我有一个JSON文件...

"1": {"address": "1",
"ctag": "Ne",
"feats": "_",
"head": "6",
"lemma": "Ghani",
"rel": "SBJ",
"tag": "Ne",
"word": "Ghani"},
"2": {"address": "2",
"ctag": "AJ",
"feats": "_",
"head": "1",
"lemma": "born",
"rel": "NPOSTMOD",
"tag": "AJ",
"word": "born"},
"3": {"address": "3",
"ctag": "P",
"feats": "_",
"head": "6",
"lemma": "in",
"rel": "ADV",
"tag": "P",
"word": "in"},
"4": {"address": "4",
"ctag": "N",
"feats": "_",
"head": "3",
"lemma": "Kabul",
"rel": "POSDEP",
"tag": "N",
"word": "Kabul"},
"5": {"address": "5",
"ctag": "PUNC",
"feats": "_",
"head": "6",
"lemma": ".",
"rel": "PUNC",
"tag": "PUNC",
"word": "."},

我读取了 JSON 文件并存储在字典中。

import json
# read file
with open('../data/data.txt', 'r') as JSON_file:
obj = json.load(JSON_file)
d = dict(obj) # stored it in a dict

我从这个dict中提取了两个列表,每个列表都包含文本和entitiesrelation,如下所示:

entities(d) = ['Ghani', 'Kabul', 'Afghanistan'....]
relation(d) = ['president', 'capital', 'located'...]

现在我想检查字典d的每个句子,如果存在任何entities(d)relation(d)元素,则应将其存储到另一个列表中。 我做了什么?

to_match = set(relation(d) + entities(d))
entities_and_relation = [[j for j in to_match if j in i] 
for i in ''.join(d).split('.')[:-1]]
print(entities_and_relation)

但这给我一个空列表。你能告诉我这里出了什么问题吗?

输出应该是这样的: [加尼,总统,阿富汗] ...

在这里我解决了这个问题,但我不知道如何为每个相关实体提供特定的格式。

for i in d.values():
if i['word'].split('.')[-1] in to_match:
print('{: ^10}'.format(i['word']))

输出:

Ghani
Kabul
Born
Kabul
Captial
Afghanistan

我的预期输出:

(Ghani, born, Kabul), (Kabul, capital, Afghanistan) or ...
Born_in(Ghani, Kabul), Capital_of(Kabul, Afghanistan)

我不知道映射它或设计它以给我预期的输出。

最新更新