有许多方法可以解析和转换XML。以下是使用Beautifuuloup 的方法之一
我有一个请求调用,它给了我一些格式化的XML数据,如下所示:
<info>
<stats vol="545080705" orders="718021755"/>
<symbols timestamp="2022-09-08 19:56:37" count="11394">
<symbol name="TQQQ" vol="8700394" last="28.23" matched="8382339" />
<symbol name="SPY" vol="8571092" last="401.00" matched="8209174" />
<symbol name="SQQQ" vol="7091770" last="44.39" matched="6734334" />
<symbol name="AVCT" vol="6493626" last="0.17" matched="6469576" />
<symbol name="UVXY" vol="6158364" last="9.42" matched="6142800" />
我很难弄清楚如何将其转换为字典、数据帧或其他对象,我可以循环并提取NAME、VOL、LAST&匹配的项目。
您可以如下解析此代码,但这取决于您需要什么:
from bs4 import BeautifulSoup as bs
import pandas as pd
response = """<info>
<stats vol="545080705" orders="718021755"/>
<symbols timestamp="2022-09-08 19:56:37" count="11394">
<symbol name="TQQQ" vol="8700394" last="28.23" matched="8382339" />
<symbol name="SPY" vol="8571092" last="401.00" matched="8209174" />
<symbol name="SQQQ" vol="7091770" last="44.39" matched="6734334" />
<symbol name="AVCT" vol="6493626" last="0.17" matched="6469576" />
<symbol name="UVXY" vol="6158364" last="9.42" matched="6142800" />
</symbols></info>"""
content = bs(response,"lxml-xml" )
df = pd.read_xml(str(content), xpath="//symbol")
输出:
last matched name vol
0 28.23 8382339 TQQQ 8700394
1 401.00 8209174 SPY 8571092
2 44.39 6734334 SQQQ 7091770
3 0.17 6469576 AVCT 6493626
4 9.42 6142800 UVXY 6158364
doc = '''
<info>
<stats vol="545080705" orders="718021755"/>
<symbols timestamp="2022-09-08 19:56:37" count="11394">
<symbol name="TQQQ" vol="8700394" last="28.23" matched="8382339" />
<symbol name="SPY" vol="8571092" last="401.00" matched="8209174" />
<symbol name="SQQQ" vol="7091770" last="44.39" matched="6734334" />
<symbol name="AVCT" vol="6493626" last="0.17" matched="6469576" />
<symbol name="UVXY" vol="6158364" last="9.42" matched="6142800" />
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(doc, 'lxml-xml')
for sym in soup.find_all('symbol'):
print("-"*32)
print(sym.get("name"))
print(sym.get("vol"))
print(sym.get("last"))
print(sym.get("matched"))
如果你想要更多的方法,你可以检查这个链接