用Python将多个XML文件解析为一个字典列表



我有一个案例,当解析多个XML文件时,实际上我希望解析XML的结果变成一个字典列表,而不是多个字典列表。

import glob
from bs4 import BeautifulSoup

def open_xml(filenames):
for filename in filenames: 
with open(filename) as fp:
soup = BeautifulSoup(fp, 'html.parser')
parse_xml_files(soup)

def parse_xml_files(soup):
stringToListOfDict = []
.
.
.
for info in infos:
dict = {} 

types = info.find_all('type')
values = info.find_all('value')

for type in types:
dict[type.attrs['p']] = type.text

stringToListOfDict.append({'Date': Date, 'Time': Time, 'NodeName': node})
for value in values:
for result in value.find_all('x'):
label = dict[result.attrs['y']]
value = result.text 
if label:
stringToListOfDict[-1][label] = value    
print(stringToListOfDict)

def main():
open_xml(filenames = glob.glob("*.xml"))
if __name__ == '__main__':
main() 

使用上面的代码,它总是生成下面的两个字典列表(例如,两个XML文件(:

[{'Date': '2020-11-19', 'Time': '18:15', 'NodeName': 'LinuxSuSe','Speed': '16'}]
[{'Date': '2020-11-19', 'Time': '18:30', 'NodeName': 'LinuxRedhat','Speed': '16'}]

所需的输出应该是一个只有两个字典的列表:


[{'Date': '2020-11-19', 'Time': '18:15', 'NodeName': 'LinuxSuSe','Speed': '16'},{'Date': '2020-11-19', 'Time': '18:30', 'NodeName':'LinuxRedhat','Speed': '16'}]

非常感谢您的反馈

print()仅用于在屏幕上发送信息,不会将所有结果加入一个列表中。

您的名称parse_xml_files拼写错误,因为它解析单个文件,而不是所有文件。这个函数应该使用return来发送单个文件的结果,在open_xml中,你应该把这个结果添加到一个列表中,然后你应该把所有文件都放在一个列表里。

未测试:

def open_xml(filenames):
all_files = []
for filename in filenames: 
with open(filename) as fp:
soup = BeautifulSoup(fp, 'html.parser')
result = parse_xml_file(soup)  # <-- get result from parse_xml_file
all_files += result  # <-- append result to list 
print(all_files)  # <-- display all results

def parse_xml_file(soup):
stringToListOfDict = []
# ... code ...
for info in infos:
dict = {} 

types = info.find_all('type')
values = info.find_all('value')

for type in types:
dict[type.attrs['p']] = type.text

stringToListOfDict.append({'Date': Date, 'Time': Time, 'NodeName': node})
for value in values:
for result in value.find_all('x'):
label = dict[result.attrs['y']]
value = result.text 
if label:
stringToListOfDict[-1][label] = value    
#print(stringToListOfDict)
return stringToListOfDict  # <-- send to open_xml

最新更新