我在如何解析多个xml文件并将其作为Python中的数据帧进行处理方面遇到了麻烦

我想将多个xml文件解析为数据帧。有相同的xpath。

我使用过元素树和os-Python库。它可以解析所有文件，但打印出空的数据帧。但是，如果代码没有多个文件，它可以正常工作。

mypath = r'C:UserstestFile'
files = [path.join(mypath, f) for f in listdir(mypath) if f.endswith('.xml')]
for file in files:
xtree = et.parse(file)
xroot = xtree.getroot()
df_cols=['value']
out_xml=pd.DataFrame(columns=df_cols)
for node in xroot.findall(r'./Group[1]/Details/Section[3]/Subreport/Group/Group[1]/Details/Section/Field'):
name = node.attrib.get('Name')
value = node.find('Value').text
out_xml = out_xml.append(pd.Series([value],index=df_cols),ignore_index=True)
df = pd.DataFrame(np.reshape(out_xml.values, (-1, 4)))

如果需要包含所有数据的单个数据帧，则需要将每个数据帧连接到一个主数据帧

mypath = r'C:testFile'
files = [path.join(mypath, f) for f in listdir(mypath) if f.endswith('.xml')]
mainDF = pd.DataFrame()
for file in files:
xtree = et.parse(file)
xroot = xtree.getroot()
df_cols=['value']
out_xml=pd.DataFrame(columns=df_cols)
for node in xroot.findall(r'./Group[1]/Details/Section[3]/Subreport/Group/Group[1]/Details/Section/Field'):
name = node.attrib.get('Name')
value = node.find('Value').text
out_xml = out_xml.append(pd.Series([value],index=df_cols),ignore_index=True)
df = pd.DataFrame(np.reshape(out_xml.values, (-1, 4)))
mainDF = pd.concat([mainDF,df])
mainDF.to_csv("filename.csv")

相关内容

最新更新

热门标签：