我正在使用python解析XML文件,但我遇到了一个问题。我得到的是字典形式的值,但如果有两个或多个相同的值,那么它们就不会重复。我确信有一种方法可以解决它,但我对python和解析XML还很陌生。。。
下面是一个XML:的例子
<Root>
<Child1>
</Child1>
<Child2>
<Data DId = "1">
<Group ID = "">
<Sport Name="Cricket" Team="6" />
<Sport Name="Football" Team="6" />
<Sport Name="Hockey" Team="5" />
</Group>
</Data>
<Data DId = "2">
<Group ID = "">
<Sport Name="Rugby" Team="6" />
<Sport Name="Baseball" Team="10" />
<Sport Name="Swimming" Team="6" />
</Group>
</Data>
</Child2>
</Root>
我想得到体育的标签值由数据分隔。我尝试过的代码是:
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
dict1 = {}
for i in root.iter('Sport'):
dict1[i.attrib['Name']] = [j.text for j in i]
dict1[i.attrib['Team']] = [k.text for k in i]
print(dict1)
但我无法获得每项运动的团队价值。
试试这个库。
from simplified_scrapy import SimplifiedDoc, utils
xml = '''
<Root>
<Child1>
</Child1>
<Child2>
<Data DId = "1">
<Group ID = "">
<Sport Name="Cricket" Team="6" />
<Sport Name="Football" Team="6" />
<Sport Name="Hockey" Team="5" />
</Group>
</Data>
<Data DId = "2">
<Group ID = "">
<Sport Name="Rugby" Team="6" />
<Sport Name="Baseball" Team="10" />
<Sport Name="Swimming" Team="6" />
</Group>
</Data>
</Child2>
</Root>
'''
# xml = utils.getFileContent('test.xml')
dict1 = {}
doc = SimplifiedDoc(xml)
datas = doc.selects('Data')
for i in datas:
dic = {}
for j in i.selects('Sport'):
dic[j['Name']] = j['Team']
dict1[i['DId']] = dic
print(dict1)
结果:
{'1': {'Cricket': '6', 'Football': '6', 'Hockey': '5'}, '2': {'Rugby': '6', 'Baseball': '10', 'Swimming': '6'}}