在使用Python / elementtree / pandas将xml转换为csv时如何保留前导0



我正在尝试使用Python将帐户信息从大型xml文件转换为csv。这在很大程度上是成功的,但是Python脚本删除了帐号的前导零,并将截断的数字向右对齐。例如,007的帐号被裁剪为7。帐号可以是数字、字符串或字母数字。

当前脚本:

import pandas as pd
import xml.etree.ElementTree as ET
# Parse the XML file and find the root
xml_file = "C:\Python Scripts\test.xml"
csv_file = "C:\Python Scripts\test.csv"
xml_tree = ET.parse(xml_file)
root = xml_tree.getroot()
# Convert parsed xml file to a csv
get_range = lambda col: range(len(col))
l = [{r[i].tag:r[i].text for i in get_range(r)} for r in root]
df = pd.DataFrame.from_dict(l)
df.to_csv(csv_file)
下面是xml文件示例test.xml:
<?xml version="1.0" encoding="UTF-8" ?>
<TSAutoUpload xsi:noNamespaceSchemaLocation="tsautoup.xsd"
xmlns:xsi="http:www.w3.org/2011/XMLSchema-instance">
<Firm>
<AcctNr>TEST</AcctNr>
<LongName>TEST ACCOUNT</LongName>
</Firm>
<Firm>
<AcctNr>007</AcctNr>
<LongName>JAMES BOND INC</LongName>
</Firm>
</TSAutoUpload>

这是test.csv输出,注意前导零的截断。7也向右对齐,理想情况下应该是007,向左对齐:

<表类>AcctNrLongNametbody><<tr>0测试测试账户17詹姆斯·邦德公司

没有任何外部库

import xml.etree.ElementTree as ET
import csv

xml = '''<TSAutoUpload xsi:noNamespaceSchemaLocation="tsautoup.xsd"
xmlns:xsi="http:www.w3.org/2011/XMLSchema-instance">
<Firm>
<AcctNr>0087</AcctNr>
<LongName>TEST ACCOUNT</LongName>
</Firm>
<Firm>
<AcctNr>007</AcctNr>
<LongName>JAMES BOND INC</LongName>
</Firm>
</TSAutoUpload>'''
root = ET.fromstring(xml)
firms = [f for f in root.findall('.//Firm')]
data = [{c.tag:c.text for c in list(f)} for f in firms]
with open('out.csv','w') as f:
csv_writer = csv.DictWriter(f, list(data[0].keys()))
csv_writer.writeheader()
csv_writer.writerows(data) 

out.csv

AcctNr,LongName
0087,TEST ACCOUNT
007,JAMES BOND INC

由于<Firm>是根节点<TSAutoUpload>的子节点,因此可以对根节点进行迭代,并将<Firm>的每个子节点的文本作为单独的列

代码:

import xml.etree.ElementTree as ET
import csv
# Parse the XML file and find the root
xml_file = r"C:Python Scriptstest.xml"
csv_file = r"C:Python Scriptstest.csv"
xml_tree = ET.parse(xml_file)
root = xml_tree.getroot()
with open(csv_file, "w", newline="") as f:
writer = csv.writer(f)
for firm in root:
writer.writerow(node.text for node in firm)

最新更新