目标:(自动化:当有大量字典列表时,我想生成一个特定格式的数据(这是输入:
a = ['et2': 'OBJ Type',
'e2': 'OBJ',
'rel': 'rel',
'et1': 'SUJ Type',
'e1': 'SUJ'},
{'et2': 'OBJ Type 2',
'e2': 'OBJ',
'rel': 'rel',
'et1': 'SUJ Type',
'e1': 'SUJ'}
]
预期输出为:
:Sub a :SubType.
:Sub :rel "Obj".
这就是我尝试过的
Sub = 0
for i in a:
entity_type1 = i["EntityType1"]
entity1 = i["Entity1"]
entity_type2 = i["EntityType2"]
entity2 = i["Entity2"]
relation = i["Relation"]
if 'Sub' in entity_type1 or entity_type2:
if entity1 == Sub and Sub <= 0 :
Sub +=1
sd_line1 = ""
sd_line2 = ""
sd_line1 = ":" + entity1 + " a " + ":" + entity_type1 + "."
relation = ":"+relation
sd_line2 ="n" ":" + entity1 + " " + relation + " "" + entity2 + ""."
sd_line3 = sd_line1 + sd_line2
print(sd_line3)
一点建议:在执行这样的转换工作流程时,请尝试将主要步骤分开,例如:从系统加载,以一种格式解析数据,提取、转换和将序列化为另一种格式,1将加载到另一个系统。
在您的代码示例中,您混合了提取、转换和序列化步骤。将这些步骤分离将使代码更易于阅读,从而更易于维护或重用。
下面,我给您两个解决方案:第一个是将数据提取到一个简单的基于dict
的subject-predicate-object
图,第二个是提取到真实的RDF图。
在这两种情况下,您将看到我将提取/转换步骤(返回图形(和序列化步骤(使用图形(分开,使它们更易于重用:
-
基于
dict
的转换是用简单的dict
或defaultdict
实现的。序列化步骤对两者都是通用的。 -
基于
rdflib.Graph
的转换对于两个序列化是常见的:一个用于您的格式,另一个用于任何可用的rdflib.Graph
序列化。
这将从a
字典中构建一个简单的基于dict
的图:
graph = {}
for e in a:
subj = e["Entity1"]
graph[subj] = {}
# :Entity1 a :EntityType1.
obj = e["EntityType1"]
graph[subj]["a"] = obj
# :Entity1 :Relation "Entity2".
pred, obj = e["Relation"], e["Entity2"]
graph[subj][pred] = obj
print(graph)
像这样:
{'X450-G2': {'a': 'switch',
'hasFeatures': 'Role-Based Policy',
'hasLocation': 'WallJack'},
'ers 3600': {'a': 'switch',
'hasFeatures': 'ExtremeXOS'},
'slx 9540': {'a': 'router',
'hasFeatures': 'ExtremeXOS',
'hasLocation': 'Chasis'}})
或者,以较短的形式,使用defaultdict
:
from collections import defaultdict
graph = defaultdict(dict)
for e in a:
subj = e["Entity1"]
# :Entity1 a :EntityType1.
graph[subj]["a"] = e["EntityType1"]
# :Entity1 :Relation "Entity2".
graph[subj][e["Relation"]] = e["Entity2"]
print(graph)
这将从图表中打印出您的subject predicate object.
三元组:
def normalize(text):
return text.replace(' ', '')
for subj, po in graph.items():
subj = normalize(subj)
# :Entity1 a :EntityType1.
print(':{} a :{}.'.format(subj, po.pop("a")))
for pred, obj in po.items():
# :Entity1 :Relation "Entity2".
print(':{} :{} "{}".'.format(subj, pred, obj))
print()
像这样:
:X450-G2 a :switch.
:X450-G2 :hasFeatures "Role-Based Policy".
:X450-G2 :hasLocation "WallJack".
:ers3600 a :switch.
:ers3600 :hasFeatures "ExtremeXOS".
:slx9540 a :router.
:slx9540 :hasFeatures "ExtremeXOS".
:slx9540 :hasLocation "Chasis".
这将使用rdflib
库构建一个真实的RDF图:
from rdflib import Graph, Literal, URIRef
from rdflib.namespace import RDF
A = RDF.type
graph = Graph()
for d in a:
subj = URIRef(normalize(d["Entity1"]))
# :Entity1 a :EntityType1.
graph.add((
subj,
A,
URIRef(normalize(d["EntityType1"]))
))
# :Entity1 :Relation "Entity2".
graph.add((
subj,
URIRef(normalize(d["Relation"])),
Literal(d["Entity2"])
))
此:
print(graph.serialize(format="n3").decode("utf-8"))
将以N3
序列化格式打印图形:
<X450-G2> a <switch> ;
<hasFeatures> "Role-Based Policy" ;
<hasLocation> "WallJack" .
<ers3600> a <switch> ;
<hasFeatures> "ExtremeXOS" .
<slx9540> a <router> ;
<hasFeatures> "ExtremeXOS" ;
<hasLocation> "Chasis" .
这将查询图表,以您的格式打印:
for subj in set(graph.subjects()):
po = dict(graph.predicate_objects(subj))
# :Entity1 a :EntityType1.
print(":{} a :{}.".format(subj, po.pop(A)))
for pred, obj in po.items():
# :Entity1 :Relation "Entity2".
print(':{} :{} "{}".'.format(subj, pred, obj))
print()