通过python将xml转换为csv

我的朋友

在下面的代码中，我尝试转换XML(https://issat.ttn.tn/cu/export/akouda.php)到CSV文件，

代码：

import requests
import xml.etree.ElementTree as Xet
import pandas as pd
from html import unescape
url = "https://issat.ttn.tn/cu/export/akouda.php"
s = unescape(requests.get(url).text)[5:-6]
df = pd.read_xml(s, xpath="//phases/* | //time")#
#df["value"] = df["value"].ffill()
df
df.to_csv('output0.csv')

这里有一些结果：

,value,phases,id,act_energy,react_energy,current_inst,voltage_inst,power_inst,power_fact,thd
0,2022-04-14 15:45:00,,,,,,,,,
1,,,0.0,0.3000000000001819,0.4324445747717669,2.0,241.7,0.27,0.57,27.39
2,,,1.0,0.0,0.0,13.06,242.5,0.66,0.2,22.69
3,,,2.0,0.0,0.0,1.07,243.7,0.15,0.58,48.05
4,2022-04-14 15:30:00,,,,,,,,,
5,,,0.0,0.2999999999999545,0.108885460271677,1.02,240.4,0.23,0.94,23.7
6,,,1.0,0.0,0.0,14.54,241.0,0.86,0.24,23.99
7,,,2.0,0.0,0.0,1.07,243.5,0.15,0.59,48.08
8,2022-04-14 15:15:00,,,,,,,,,
9,,,0.0,0.3999999999998636,0.5618044649492236,0.7,243.1,0.1,0.58,42.46
10,,,1.0,0.0,0.0,17.82,241.9,1.99,0.46,33.59
11,,,2.0,0.0,0.0,1.08,246.3,0.15,0.58,51.09
12,2022-04-14 15:00:00,,,,,,,,,
13,,,0.0,0.6000000000001364,0.8427066974243144,0.71,241.7,0.1,0.58,44.02
14,,,1.0,0.0,0.0,18.74,240.5,2.21,0.49,31.3
15,,,2.0,0.0,0.0,1.08,245.3,0.15,0.58,51.77

我需要：

删除具有日期但没有读数的类似行的行(0&4&8&12)
只获取id为1的行
删除phases列

有人能帮忙吗？

考虑运行两个read_xml调用，调整xpath并使用attrs_only。由于两者将处于同一级别(一个<phases>位于@id=1，一个为<time>)，因此join的结果为：

...
time_df = pd.read_xml(s, xpath="//time", attrs_only=True, names=["time"])
phase_df = pd.read_xml(s, xpath="//phase[@id=1]")
time_phase_df = time_df.join(phase_df)
time_phase_df
time  id  act_energy  ...  power_inst  power_fact    thd
0     2022-04-15 00:00:00   1           0  ...        0.84        0.28  22.35
1     2022-04-14 23:45:00   1           0  ...        0.83        0.28  23.16
2     2022-04-14 23:30:00   1           0  ...        0.83        0.28  22.43
3     2022-04-14 23:15:00   1           0  ...        0.83        0.28  22.56
4     2022-04-14 23:00:00   1           0  ...        0.82        0.28  22.57
...  ..         ...  ...         ...         ...    ...
1289  2022-04-01 02:15:00   1           0  ...        0.69        0.25  22.70
1290  2022-04-01 02:00:00   1           0  ...        0.69        0.25  22.66
1291  2022-04-01 01:45:00   1           0  ...        0.69        0.25  22.46
1292  2022-04-01 01:30:00   1           0  ...        0.69        0.25  22.00
1293  2022-04-01 01:25:00   1           0  ...        0.69        0.25  22.34

即将在Pandas 1.5中推出的read_xml将支持解析日期：

time_df = pd.read_xml(
s, xpath="//time", attrs_only=True, names=["time"], parse_dates=["value"]
)

尝试：

import requests
import pandas as pd
from html import unescape
url = "https://issat.ttn.tn/cu/export/akouda.php"
s = unescape(requests.get(url).text)[5:-6]
df = pd.read_xml(s, xpath="//phases/* | //time")
df["value"] = df["value"].ffill()
df = df.drop(columns="phases")
# if you want only id==1 you can skip this:
# df = df[~df.isna().any(axis=1)]
print(df[df["id"] == 1])

打印：

value   id  act_energy  react_energy  current_inst  voltage_inst  power_inst  power_fact    thd
2     2022-04-14 23:15:00  1.0         0.0           0.0         12.06         241.0        0.83        0.28  22.56
6     2022-04-14 23:00:00  1.0         0.0           0.0         12.04         240.5        0.82        0.28  22.57
10    2022-04-14 22:45:00  1.0         0.0           0.0         12.04         240.2        0.82        0.28  22.56
14    2022-04-14 22:30:00  1.0         0.0           0.0         12.03         240.1        0.82        0.28  22.24
18    2022-04-14 22:15:00  1.0         0.0           0.0         12.01         240.1        0.82        0.28  22.52
22    2022-04-14 22:00:00  1.0         0.0           0.0         12.00         239.8        0.82        0.28  22.74
26    2022-04-14 21:45:00  1.0         0.0           0.0         11.96         239.9        0.82        0.28  22.58
...

相关内容

最新更新

热门标签：