我在xml文件中有数据,我正在读取3列:price , name , calories
XML 数据
<?xml version='1.0' encoding='utf-8'?>
<data>
<row>
<index>0</index>
<price>$5.95</price>
<name>Belgian Waffles</name>
<desc>Two of our famous Belgian Waffles with plenty of real maple syrup</desc>
<calories>650</calories>
</row>
<row>
<index>1</index>
<price>$7.95</price>
<name>Strawberry Belgian Waffles</name>
<desc>Light Belgian waffles covered with strawberries and whipped cream</desc>
<calories>900</calories>
</row>
<row>
<index>2</index>
<price>$8.95</price>
<name>Berry-Berry Belgian Waffles</name>
<desc>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</desc>
<calories>900</calories>
</row>
<row>
<index>3</index>
<price>$4.50</price>
<name>French Toast</name>
<desc>Thick slices made from our homemade sourdough bread</desc>
<calories>600</calories>
</row>
<row>
<index>4</index>
<price>$6.95</price>
<name>Homestyle Breakfast</name>
<desc>Two eggs, bacon or sausage, toast, and our ever-popular hash browns</desc>
<calories>950</calories>
</row>
</data>
法典:
import xml.etree.ElementTree as ET
parse_xml = ET.parse('/content/sample_data/xyz.xml')
get_root_element = parse_xml.getroot()
for data in get_root_element.findall('row'):
prc = data.find('price')
nm = data.find('name')
cal = data.find('calories')
temp = prc.text + ',' + nm.text + ',' + cal.text
print(temp)
上面的代码给了我数据,但需要将这些数据存储到 csv 文件
我需要如何为此编写逻辑。可以用pandas / csv
需要将我的标头也添加到该csv文件中
标题 :price , name , calories
@kiric8494的解决方案已经足够好了,你可以坚持下去。您也可以使用更短的csv.DictWriter
来实现它:
import xml.etree.ElementTree as ET
from csv import DictWriter
parse_xml = ET.parse(r"/content/sample_data/xyz.xml")
root = parse_xml.getroot()
with open(r"/content/sample_data/abc.csv", "w", newline="") as f:
writer = DictWriter(f, fieldnames=("price", "name", "calories"), extrasaction="ignore")
writer.writeheader()
writer.writerows({e.tag: e.text for e in row} for row in root)
基本上,我们设置DictWriter
忽略除price
、name
和calories
之外的所有字段,然后将生成器传递给.writerows()
,后者构造<row>
的所有子节点的字典,其中键是标签,值是文本。
感谢您的解决方案@ewz93
我以以下方式完成
import xml.etree.ElementTree as ET
import csv
parse_xml = ET.parse('/content/sample_data/xyz.xml')
get_root_element = parse_xml.getroot()
final_data_set = [] # [ [],[],[] ]
for data in get_root_element.findall('row'):
temp = []
prc = data.find('price')
nm = data.find('name')
cal = data.find('calories')
temp=[prc.text + ',' + nm.text + ',' + cal.text]
final_data_set.append(temp)
headers = ['price','name','calories']
with open('/content/sample_data/abc.csv','w') as wr:
csv_wr = csv.writer(wr)
csv_wr.writerow(headers)
for elem in final_data_set:
for item in elem:
csv_wr.writerow(item.split(','))
我只是将值放在列表中并从中创建数据帧:
import xml.etree.ElementTree as ET
import pandas as pd
parse_xml = ET.parse('/content/sample_data/xyz.xml')
get_root_element = parse_xml.getroot()
prc_list = []
nm_list = []
cal_list = []
for data in get_root_element.findall('row'):
prc_list.append(data.find('price'))
nm_list.append(data.find('name'))
cal_list.append(data.find('calories'))
df = pd.DataFrame({"price": prc_list, "name": nm_list, "calories": cal_list})
df.to_excel("your_file_name.xlsx") # or if you really want a CSV use df.to_csv("your_file_name.xlsx")
这可能不是最漂亮的解决方案,因为还有 pandas.read_xml(),因此您可能可以通过直接将 XML 读取到 DataFrame 中,然后将其直接写入 CSV 来缩短它并避免使用 etree。