使用 CSV 文件中存储的值更新 LXML 属性



我能够遍历资产.csv文件并为每一行创建XML片段,但是在迭代时,我尝试使用每行的ID值填充每个sigEquipment ID属性。

以下是资产的快照.csv

ID,CODE,EL,TR,DIR,MIL,X,Y,Z,DESC
30734,X1,CC1,8100,,008+0249 (9-1497),518169.12,185128.27,37.52,
31597,X10,BB1,9100,,008+0286 (9-1460),518151.38,185157.1,36.7,XXX

到目前为止的代码是:

import pandas as pd
from lxml import etree as et
df = pd.read_csv('assets.csv', sep=',')
root = et.Element('SchemeData', xmlns='boo')
for row in df:
equipment= et.SubElement(root, 'Equipment')
sigEquipment = et.SubElement(equipment, 'SigEquipment', ID='', name='')
sigEquipment.set('ID', str(df['ID'].iloc[0]))

print(et.tostring(root, pretty_print=True).decode('utf-8'))

我不确定如何正确编码这部分sigEquipment.set('ID', str(df['ID'].iloc[0]))如何能够为每行填充正确的 ID。

目前我得到

<SchemeData xmlns="boo">
<Equipment>
<SigEquipment fileUID="30734" name=""/>
</Equipment>
<Equipment>
<SigEquipment fileUID="30734" name=""/>
</Equipment>
</SchemeData>

感谢您的任何帮助

你的代码有一些问题,所以让我一行一行地看一遍。

>>> import pandas
>>> df = pandas.read_csv("assets.csv")

如果查看read_csv()定义,则会看到此函数返回数据帧。如果要迭代它,则必须指定如何定义迭代看到的内容。在这种情况下,使用iterrows()很有用,并返回行索引和行数据的两个元组:

>>> for index, row in df.iterrows():
...     print(index, row["ID"])
... 
0 30734
1 31597

如您所见,可以使用列的名称(由 CSV 文件的第一行定义(对列编制索引。现在让我们把这些放在一起:

>>> import lxml.etree
>>> root = lxml.etree.Element("SchemeData", xmlns="Boo")
>>> for index, row in df.iterrows():
...     equipment = lxml.etree.SubElement(root, "Equipment")
...     sigEquipment = lxml.etree.SubElement(equipment, "SigEquipment")
...     sigEquipment.attrib["fileUID"] = str(row["ID"])
...     sigEquipment.attrib["name"] = ""

这将循环访问 DataFrame 实例的行,为每一行选取"ID"列,并将该"ID"存储为 XML 树中每个SigEquipment节点的属性"fileUID"。在 lxml 中,节点属性作为字典处理。

您现在可以打印该树:

>>> print(lxml.etree.tostring(root, pretty_print=True).decode())
<SchemeData xmlns="Boo">
<Equipment>
<SigEquipment fileUID="30734" name=""/>
</Equipment>
<Equipment>
<SigEquipment fileUID="31597" name=""/>
</Equipment>
</SchemeData>

最新更新