将字典列表转换为rdf格式



目标:(自动化:当有大量字典列表时,我想生成一个特定格式的数据(这是输入:

a = ['et2': 'OBJ Type',
  'e2': 'OBJ',
  'rel': 'rel',
  'et1': 'SUJ Type',
  'e1': 'SUJ'},
     {'et2': 'OBJ Type 2',
  'e2': 'OBJ',
  'rel': 'rel',
  'et1': 'SUJ Type',
  'e1': 'SUJ'}
  ]

预期输出为:

:Sub a :SubType.
:Sub :rel "Obj".
 

这就是我尝试过的

Sub = 0

for i in a:
    entity_type1 = i["EntityType1"]
    entity1 = i["Entity1"]
    entity_type2 = i["EntityType2"]
    entity2 = i["Entity2"]
    relation = i["Relation"]
    if 'Sub' in entity_type1 or entity_type2:
        if entity1 == Sub and Sub <= 0 :
            
            Sub +=1
            sd_line1 = ""
            sd_line2 = ""
            sd_line1 = ":" + entity1 + " a " + ":" + entity_type1 + "."
            relation = ":"+relation
            sd_line2 ="n"  ":" + entity1 + " " + relation + " "" + entity2 + ""."
            sd_line3 = sd_line1 + sd_line2
            print(sd_line3)

        
      

一点建议:在执行这样的转换工作流程时,请尝试将主要步骤分开,例如:从系统加载以一种格式解析数据,提取转换序列化为另一种格式,1将加载到另一个系统。

在您的代码示例中,您混合了提取、转换和序列化步骤。将这些步骤分离将使代码更易于阅读,从而更易于维护或重用。

下面,我给您两个解决方案:第一个是将数据提取到一个简单的基于dictsubject-predicate-object图,第二个是提取到真实的RDF图。

在这两种情况下,您将看到我将提取/转换步骤(返回图形(和序列化步骤(使用图形(分开,使它们更易于重用:

  • 基于dict的转换是用简单的dictdefaultdict实现的。序列化步骤对两者都是通用的。

  • 基于rdflib.Graph的转换对于两个序列化是常见的:一个用于您的格式,另一个用于任何可用的rdflib.Graph序列化。


这将从a字典中构建一个简单的基于dict的图:

graph = {}
for e in a:
    subj = e["Entity1"]
    graph[subj] = {}
    # :Entity1 a :EntityType1.
    obj = e["EntityType1"]
    graph[subj]["a"] = obj  
    # :Entity1 :Relation "Entity2".    
    pred, obj = e["Relation"], e["Entity2"]
    graph[subj][pred] = obj  
print(graph)

像这样:

{'X450-G2': {'a': 'switch',
             'hasFeatures': 'Role-Based Policy',
             'hasLocation': 'WallJack'},
 'ers 3600': {'a': 'switch', 
              'hasFeatures': 'ExtremeXOS'},
 'slx 9540': {'a': 'router',
              'hasFeatures': 'ExtremeXOS',
              'hasLocation': 'Chasis'}})

或者,以较短的形式,使用defaultdict:

from collections import defaultdict
graph = defaultdict(dict)
for e in a:
    subj = e["Entity1"]
    
    # :Entity1 a :EntityType1.
    graph[subj]["a"] = e["EntityType1"]  
    # :Entity1 :Relation "Entity2".    
    graph[subj][e["Relation"]] = e["Entity2"]  
print(graph)

这将从图表中打印出您的subject predicate object.三元组:

def normalize(text):
    return text.replace(' ', '')
for subj, po in graph.items():
    subj = normalize(subj)
    # :Entity1 a :EntityType1.
    print(':{} a :{}.'.format(subj, po.pop("a")))
    for pred, obj in po.items():
        # :Entity1 :Relation "Entity2".    
        print(':{} :{} "{}".'.format(subj, pred, obj))
    print()

像这样:

:X450-G2 a :switch.
:X450-G2 :hasFeatures "Role-Based Policy".
:X450-G2 :hasLocation "WallJack".
:ers3600 a :switch.
:ers3600 :hasFeatures "ExtremeXOS".
:slx9540 a :router.
:slx9540 :hasFeatures "ExtremeXOS".
:slx9540 :hasLocation "Chasis".

这将使用rdflib库构建一个真实的RDF图:

from rdflib import Graph, Literal, URIRef
from rdflib.namespace import RDF
A = RDF.type
graph = Graph()
for d in a:
   subj = URIRef(normalize(d["Entity1"]))
    # :Entity1 a :EntityType1.
    graph.add((
        subj,
        A, 
        URIRef(normalize(d["EntityType1"]))
    ))
    
    # :Entity1 :Relation "Entity2".    
    graph.add((
        subj,
        URIRef(normalize(d["Relation"])), 
        Literal(d["Entity2"])
    ))

此:

print(graph.serialize(format="n3").decode("utf-8"))

将以N3序列化格式打印图形:

<X450-G2> a <switch> ;
    <hasFeatures> "Role-Based Policy" ;
    <hasLocation> "WallJack" .
<ers3600> a <switch> ;
    <hasFeatures> "ExtremeXOS" .
<slx9540> a <router> ;
    <hasFeatures> "ExtremeXOS" ;
    <hasLocation> "Chasis" .

这将查询图表,以您的格式打印:

for subj in set(graph.subjects()):
    po = dict(graph.predicate_objects(subj))
    
    # :Entity1 a :EntityType1.
    print(":{} a :{}.".format(subj, po.pop(A)))
    
    for pred, obj in po.items():
        # :Entity1 :Relation "Entity2".    
        print(':{} :{} "{}".'.format(subj, pred, obj))
    print()

最新更新