我需要帮助将XML转换为CSV文件,我部分成功了,但我不知道如何将时间值和阶段id添加到python代码中。
我有以下XML,它是从XML链接复制的:
<?xml version="1.0" encoding="UTF-8"?>
<akouda>
<time value="2022-04-12 13:45:00">
<phases>
<phase id="0">
<act_energy>1.2000000000000455</act_energy>
<react_energy>1.9711529080673937</react_energy>
<current_inst>7.08</current_inst>
<voltage_inst>242.7</voltage_inst>
<power_inst>0.9</power_inst>
<power_fact>0.52</power_fact>
<thd>66.45</thd>
</phase>
<phase id="1">
<act_energy>0</act_energy>
<react_energy>0</react_energy>
<current_inst>16.1</current_inst>
<voltage_inst>242</voltage_inst>
<power_inst>2.38</power_inst>
<power_fact>0.61</power_fact>
<thd>31</thd>
</phase>
<phase id="2">
<act_energy>0</act_energy>
<react_energy>0</react_energy>
<current_inst>8.64</current_inst>
<voltage_inst>242.7</voltage_inst>
<power_inst>2.01</power_inst>
<power_fact>0.95</power_fact>
<thd>26.81</thd>
</phase>
</phases>
</time>
<time value="2022-04-12 13:30:00">
<phases>
<phase id="0">
<act_energy>1.2999999999999545</act_energy>
<react_energy>2.1354156504061876</react_energy>
<current_inst>7.06</current_inst>
<voltage_inst>242.2</voltage_inst>
<power_inst>0.9</power_inst>
<power_fact>0.52</power_fact>
<thd>65.89</thd>
</phase>
<phase id="1">
<act_energy>0</act_energy>
<react_energy>0</react_energy>
<current_inst>16.95</current_inst>
<voltage_inst>241</voltage_inst>
<power_inst>2.61</power_inst>
<power_fact>0.63</power_fact>
<thd>29.1</thd>
</phase>
<phase id="2">
<act_energy>0</act_energy>
<react_energy>0</react_energy>
<current_inst>9.57</current_inst>
<voltage_inst>242.4</voltage_inst>
<power_inst>2.23</power_inst>
<power_fact>0.96</power_fact>
<thd>24.12</thd>
</phase>
</phases>
</time>
</akouda>
以及以下将XML转换为CSV的代码:
import xml.etree.ElementTree as Xet
import pandas as pd
rows = []
# Parsing the XML file
xmlparse = Xet.parse('sample.xml')
root = xmlparse.getroot()
for i in root.findall('phases'):
act_energy = i.find("act_energy").text
react_energy = i.find("react_energy").text
current_inst = i.find("current_inst").text
voltage_inst = i.find("voltage_inst").text
power_inst = i.find("power_inst").text
power_fact = i.find("power_fact").text
thd = i.find("thd").text
rows.append({
"act_energy": act_energy,
"react_energy": react_energy,
"current_inst": current_inst,
"voltage_inst": voltage_inst,
"power_inst": power_inst,
"power_fact": power_fact,
"thd": thd,
})
df = pd.DataFrame(rows )
# Writing dataframe to csv
df.to_csv('output.csv')
- 如何在python代码中包含时间值和阶段id
- 如何从链接而不是从文件中插入XML
感谢
例如,您可以将pd.read_xml
函数与适当的XPath一起使用(也可以为.read_xml()
函数提供URL(:
df = pd.read_xml("data.xml", xpath="//phases/* | //time")
df["value"] = df["value"].ffill()
print(df.dropna(how="all", axis=1).dropna(axis=0))
打印:
value id act_energy react_energy current_inst voltage_inst power_inst power_fact thd
1 2022-04-12 13:45:00 0.0 1.2 1.971153 7.08 242.7 0.90 0.52 66.45
2 2022-04-12 13:45:00 1.0 0.0 0.000000 16.10 242.0 2.38 0.61 31.00
3 2022-04-12 13:45:00 2.0 0.0 0.000000 8.64 242.7 2.01 0.95 26.81
5 2022-04-12 13:30:00 0.0 1.3 2.135416 7.06 242.2 0.90 0.52 65.89
6 2022-04-12 13:30:00 1.0 0.0 0.000000 16.95 241.0 2.61 0.63 29.10
7 2022-04-12 13:30:00 2.0 0.0 0.000000 9.57 242.4 2.23 0.96 24.12
编辑:从提供的URL读取:
import requests
import pandas as pd
from html import unescape
url = "https://issat.ttn.tn/cu/export/akouda.php"
# quick-and-dirty method to remove first <pre> and last </pre>
# ideally, you will do this with html parser:
s = unescape(requests.get(url).text)[5:-6]
df = pd.read_xml(s, xpath="//phases/* | //time")
df["value"] = df["value"].ffill()
print(df.dropna(how="all", axis=1).dropna(axis=0))