如何添加所有列(XML到CSV)



我需要帮助将XML转换为CSV文件,我部分成功了,但我不知道如何将时间值和阶段id添加到python代码中。

我有以下XML,它是从XML链接复制的:

<?xml version="1.0" encoding="UTF-8"?>
<akouda>   
<time value="2022-04-12 13:45:00">
<phases>
<phase id="0">          
<act_energy>1.2000000000000455</act_energy>
<react_energy>1.9711529080673937</react_energy>
<current_inst>7.08</current_inst>
<voltage_inst>242.7</voltage_inst>
<power_inst>0.9</power_inst>
<power_fact>0.52</power_fact>
<thd>66.45</thd>
</phase>
<phase id="1">          
<act_energy>0</act_energy>
<react_energy>0</react_energy>
<current_inst>16.1</current_inst>
<voltage_inst>242</voltage_inst>
<power_inst>2.38</power_inst>
<power_fact>0.61</power_fact>
<thd>31</thd>
</phase>
<phase id="2">          
<act_energy>0</act_energy>
<react_energy>0</react_energy>
<current_inst>8.64</current_inst>
<voltage_inst>242.7</voltage_inst>
<power_inst>2.01</power_inst>
<power_fact>0.95</power_fact>
<thd>26.81</thd>
</phase>
</phases>
</time>
<time value="2022-04-12 13:30:00">
<phases>
<phase id="0">          
<act_energy>1.2999999999999545</act_energy>
<react_energy>2.1354156504061876</react_energy>
<current_inst>7.06</current_inst>
<voltage_inst>242.2</voltage_inst>
<power_inst>0.9</power_inst>
<power_fact>0.52</power_fact>
<thd>65.89</thd>
</phase>
<phase id="1">          
<act_energy>0</act_energy>
<react_energy>0</react_energy>
<current_inst>16.95</current_inst>
<voltage_inst>241</voltage_inst>
<power_inst>2.61</power_inst>
<power_fact>0.63</power_fact>
<thd>29.1</thd>
</phase>
<phase id="2">          
<act_energy>0</act_energy>
<react_energy>0</react_energy>
<current_inst>9.57</current_inst>
<voltage_inst>242.4</voltage_inst>
<power_inst>2.23</power_inst>
<power_fact>0.96</power_fact>
<thd>24.12</thd>
</phase>
</phases>
</time>
</akouda>

以及以下将XML转换为CSV的代码:

import xml.etree.ElementTree as Xet
import pandas as pd
rows = []
# Parsing the XML file
xmlparse = Xet.parse('sample.xml')
root = xmlparse.getroot()
for i in root.findall('phases'):
act_energy = i.find("act_energy").text
react_energy = i.find("react_energy").text
current_inst = i.find("current_inst").text
voltage_inst = i.find("voltage_inst").text
power_inst = i.find("power_inst").text
power_fact = i.find("power_fact").text
thd = i.find("thd").text

rows.append({
"act_energy": act_energy,
"react_energy": react_energy,
"current_inst": current_inst,
"voltage_inst": voltage_inst,
"power_inst": power_inst,
"power_fact": power_fact,
"thd": thd,
})
df = pd.DataFrame(rows )
# Writing dataframe to csv
df.to_csv('output.csv')

  1. 如何在python代码中包含时间值和阶段id
  2. 如何从链接而不是从文件中插入XML

感谢

例如,您可以将pd.read_xml函数与适当的XPath一起使用(也可以为.read_xml()函数提供URL(:

df = pd.read_xml("data.xml", xpath="//phases/* | //time")
df["value"] = df["value"].ffill()
print(df.dropna(how="all", axis=1).dropna(axis=0))

打印:

value   id  act_energy  react_energy  current_inst  voltage_inst  power_inst  power_fact    thd
1  2022-04-12 13:45:00  0.0         1.2      1.971153          7.08         242.7        0.90        0.52  66.45
2  2022-04-12 13:45:00  1.0         0.0      0.000000         16.10         242.0        2.38        0.61  31.00
3  2022-04-12 13:45:00  2.0         0.0      0.000000          8.64         242.7        2.01        0.95  26.81
5  2022-04-12 13:30:00  0.0         1.3      2.135416          7.06         242.2        0.90        0.52  65.89
6  2022-04-12 13:30:00  1.0         0.0      0.000000         16.95         241.0        2.61        0.63  29.10
7  2022-04-12 13:30:00  2.0         0.0      0.000000          9.57         242.4        2.23        0.96  24.12

编辑:从提供的URL读取:

import requests
import pandas as pd
from html import unescape
url = "https://issat.ttn.tn/cu/export/akouda.php"
# quick-and-dirty method to remove first <pre> and last </pre>
# ideally, you will do this with html parser:
s = unescape(requests.get(url).text)[5:-6]
df = pd.read_xml(s, xpath="//phases/* | //time")
df["value"] = df["value"].ffill()
print(df.dropna(how="all", axis=1).dropna(axis=0))

相关内容

  • 没有找到相关文章

最新更新