解析设备树与 pyparing 到结构化字典中



对于我的C++RTOS,我正在使用pyparsing模块在Python中编写设备树"源"文件(.dts)的解析器。我能够将设备树的结构解析为(嵌套)字典,其中属性名称或节点名称是字典键(字符串),属性值或节点是字典值(字符串或嵌套字典)。

假设我有以下示例设备树结构:

/ {
property1 = "string1";
property2 = "string2";
node1 {
property11 = "string11";
property12 = "string12";
node11 {
property111 = "string111";
property112 = "string112";
};
};
node2 {
property21 = "string21";
property22 = "string22";
};
};

我能够将其解析为类似的东西:

{'/': {'node1': {'node11': {'property111': ['string111'], 'property112': ['string112']},
'property11': ['string11'],
'property12': ['string12']},
'node2': {'property21': ['string21'], 'property22': ['string22']},
'property1': ['string1'],
'property2': ['string2']}}

但是,对于我的需求,我更愿意以不同的方式构建这些数据。我希望将所有属性作为键"属性"的嵌套字典,将所有子节点作为键"子"的嵌套字典。原因是设备树(尤其是节点)有一些"元数据",我希望这些元数据作为键值对,这需要我将节点的实际"内容"向下移动一级,以避免键的任何名称冲突。所以我更喜欢上面的示例看起来像这样:

{'/': {
'properties': {
'property1': ['string1'],
'property2': ['string2']
},
'nodes': {
'node1': {
'properties': {
'property11': ['string11'],
'property12': ['string12']
}
'nodes': {
'node11': {
'properties': {
'property111': ['string111'],
'property112': ['string112']
}
'nodes': {
}
}
}
},
'node2': {
'properties': {
'property21': ['string21'],
'property22': ['string22']
}
'nodes': {
}
}
}
}
}

我尝试将"name"添加到解析令牌中,但这会导致"双倍"字典元素(这是意料之中的,因为这种行为在pyparsing文档中有所描述)。这可能不是问题,但从技术上讲,节点或属性可以命名为"属性"或"子"(或我选择的任何内容),因此我认为这样的解决方案并不可靠。

我也尝试使用setParseAction()将令牌转换为字典片段(我希望我可以将{'key': 'value'}转换为{'properties': {'key': 'value'}}),但这根本不起作用......

这是否可能直接与 pyparing 一起使用?我准备做第二阶段,将原始字典转换为我需要的任何结构,但作为一个完美主义者,如果可能的话,我更愿意使用单次运行 pyparsing 的解决方案。

作为参考,这里有一个示例代码(Python 3),它将设备树源代码转换为"非结构化"字典。请注意,此代码只是一个简化,不支持.dts中的所有功能(除字符串、值列表、单位地址、标签等以外的任何数据类型) - 它只支持字符串属性和节点嵌套。

#!/usr/bin/env python
import pyparsing
import pprint
nodeName = pyparsing.Word(pyparsing.alphas, pyparsing.alphanums + ',._+-', max = 31)
propertyName = pyparsing.Word(pyparsing.alphanums + ',._+?#', max = 31)
propertyValue = pyparsing.dblQuotedString.setParseAction(pyparsing.removeQuotes)
property = pyparsing.Dict(pyparsing.Group(propertyName + pyparsing.Group(pyparsing.Literal('=').suppress() +
propertyValue) + pyparsing.Literal(';').suppress()))
childNode = pyparsing.Forward()
rootNode = pyparsing.Dict(pyparsing.Group(pyparsing.Literal('/') + pyparsing.Literal('{').suppress() +
pyparsing.ZeroOrMore(property) + pyparsing.ZeroOrMore(childNode) +
pyparsing.Literal('};').suppress()))
childNode <<= pyparsing.Dict(pyparsing.Group(nodeName + pyparsing.Literal('{').suppress() +
pyparsing.ZeroOrMore(property) + pyparsing.ZeroOrMore(childNode) +
pyparsing.Literal('};').suppress()))
dictionary = rootNode.parseString("""
/ {
property1 = "string1";
property2 = "string2";
node1 {
property11 = "string11";
property12 = "string12";
node11 {
property111 = "string111";
property112 = "string112";
};
};
node2 {
property21 = "string21";
property22 = "string22";
};
};
""").asDict()
pprint.pprint(dictionary, width = 120)

你真的离得很近。我刚刚做了以下操作:

  • 为"属性"和"节点"子部分添加了Group和结果名称
  • 将一些标点符号文本更改为常量(如果右大括号和分号之间有空格,Literal("};")将无法匹配,但RBRACE + SEMI将容纳空格)
  • 删除了rootNode上的最外层Dict

法典:

LBRACE,RBRACE,SLASH,SEMI,EQ = map(pyparsing.Suppress, "{}/;=")
nodeName = pyparsing.Word(pyparsing.alphas, pyparsing.alphanums + ',._+-', max = 31)
propertyName = pyparsing.Word(pyparsing.alphanums + ',._+?#', max = 31)
propertyValue = pyparsing.dblQuotedString.setParseAction(pyparsing.removeQuotes)
property = pyparsing.Dict(pyparsing.Group(propertyName + EQ 
+ pyparsing.Group(propertyValue)
+ SEMI))
childNode = pyparsing.Forward()
rootNode = pyparsing.Group(SLASH + LBRACE
+ pyparsing.Group(pyparsing.ZeroOrMore(property))("properties")
+ pyparsing.Group(pyparsing.ZeroOrMore(childNode))("children")
+ RBRACE + SEMI)
childNode <<= pyparsing.Dict(pyparsing.Group(nodeName + LBRACE
+ pyparsing.Group(pyparsing.ZeroOrMore(property))("properties")
+ pyparsing.Group(pyparsing.ZeroOrMore(childNode))("children")
+ RBRACE + SEMI))

使用 asDict 转换为字典并使用 pprint 打印可得到:

pprint.pprint(result[0].asDict())
{'children': {'node1': {'children': {'node11': {'children': [],
'properties': {'property111': ['string111'],
'property112': ['string112']}}},
'properties': {'property11': ['string11'],
'property12': ['string12']}},
'node2': {'children': [],
'properties': {'property21': ['string21'],
'property22': ['string22']}}},
'properties': {'property1': ['string1'], 'property2': ['string2']}}

您还可以使用 pyparsing 的ParseResults类中包含的dump()方法来帮助可视化列表和字典/命名空间样式对结果的访问,而无需任何转换调用

print(result[0].dump())
[[['property1', ['string1']], ['property2', ['string2']]], [['node1', [['property11', ['string11']], ['property12', ['string12']]], [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]], ['node2', [['property21', ['string21']], ['property22', ['string22']]], []]]]
- children: [['node1', [['property11', ['string11']], ['property12', ['string12']]], [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]], ['node2', [['property21', ['string21']], ['property22', ['string22']]], []]]
- node1: [[['property11', ['string11']], ['property12', ['string12']]], [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]]
- children: [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]
- node11: [[['property111', ['string111']], ['property112', ['string112']]], []]
- children: []
- properties: [['property111', ['string111']], ['property112', ['string112']]]
- property111: ['string111']
- property112: ['string112']
- properties: [['property11', ['string11']], ['property12', ['string12']]]
- property11: ['string11']
- property12: ['string12']
- node2: [[['property21', ['string21']], ['property22', ['string22']]], []]
- children: []
- properties: [['property21', ['string21']], ['property22', ['string22']]]
- property21: ['string21']
- property22: ['string22']
- properties: [['property1', ['string1']], ['property2', ['string2']]]
- property1: ['string1']
- property2: ['string2']

最新更新