如何使用Python解析XML名称和属性(复杂)



嗨,我正在尝试使用python将XML文件中的一些财务信息解析为JSON格式。

当前使用:

import xml.etree.ElementTree as ET  
tree = ET.parse(filename)  
root = tree.getroot()
# all items
print('nAll item data:')
for elem in root:
all_descendants = [e.tag.split('}', 1)[1] for e in elem.iter()]
print(all_descendants)

作为XML:的一个例子

<pt:ShareholderFunds decimals="0" unitRef="GBP" contextRef="e2020-01-31">4</pt:ShareholderFunds>
<pt:ShareholderFunds decimals="0" unitRef="GBP" contextRef="e2019-01-31">4</pt:ShareholderFunds>
<pt:ApprovalDetails>
<pt:NameApprovingDirector contextRef="y2020-01-31">Mr FAKE FAKE</pt:NameApprovingDirector>
</pt:ApprovalDetails>
<pt:DetailsOrdinarySharesAllotted>
<pt:TypeOrdinaryShare contextRef="y2020-01-31">Ordinary</pt:TypeOrdinaryShare>
<pt:ParValueOrdinaryShare decimals="0" contextRef="y2020-01-31" unitRef="GBP">1</pt:ParValueOrdinaryShare>
<pt:ValueOrdinarySharesAllotted decimals="0" contextRef="e2020-01-31" unitRef="GBP">4</pt:ValueOrdinarySharesAllotted>
<pt:ValueOrdinarySharesAllotted decimals="0" contextRef="e2019-01-31" unitRef="GBP">4</pt:ValueOrdinarySharesAllotted>
<pt:NumberOrdinarySharesAllotted decimals="0" contextRef="e2020-01-31" unitRef="shares">4</pt:NumberOrdinarySharesAllotted>
<pt:NumberOrdinarySharesAllotted decimals="0" contextRef="e2019-01-31" unitRef="shares">4</pt:NumberOrdinarySharesAllotted>
</pt:DetailsOrdinarySharesAllotted>
<pt:EquityAuthorisedDetails>
<pt:TypeOrdinaryShare contextRef="y2020-01-31">Ordinary</pt:TypeOrdinaryShare>
<pt:NumberOrdinarySharesAuthorised decimals="0" unitRef="shares" contextRef="e2020-01-31">0</pt:NumberOrdinarySharesAuthorised>
<pt:ParValueOrdinaryShare decimals="0" contextRef="y2020-01-31" unitRef="GBP">1</pt:ParValueOrdinaryShare>
</pt:EquityAuthorisedDetails>

而且获得这些名字效果很好:["股东基金"][ApprovalDetails,'NameApprovingDirector]

然而,我也需要它来获得值。有人知道怎么做吗?

理想情况下,输出如下所示:

{
{
"name": "ShareholderFunds",
"value": 4,
"unitRef": "GBP",
"contextRef": "e2020-01-31",
},
{
"name": "ApprovalDetails"
{
"name": "NameApprovingDirector",
"value": "Mr FAKE FAKE"
"contextRef": "y2020-01-31",
}
},
{
"name": "DetailsORdinarySharesAlloted"
{
"name":"TypeOrdinaryShare",
"contextRef":"y2020-01-31",
"value":"Ordinary"
},
{
"name":"ParValueOrdinaryShare",
"contextRef": "e2020-01-31",
"unitRef":"GBP",
"value":"4"  
}
} etc...
}

我相信有人能给我指明正确的方向(我已经手动键入了JSON,所以如果有错误,他们只是我没有正确键入(

提前感谢

https://docs.python.org/2/library/xml.etree.elementtree.html

import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()

for child in root:
print child.tag, child.attrib

最新更新