对于
<POR Cli="1" Name="Paul Smith" Street="SN" >
<Sal Val="1000" Gan="M">
<Fam dep="1" dog="2" />
</Sal>
</POR>
<POR Cli="2" Name="Mary Smith" Street="SN" >
<Sal Val="2000" Gan="S">
<Fam dep="0" dog="1" />
</Sal>
</POR>
我想提取标签做xml
cli;name;Street;val;gran;dep;dog
并且在写入aws s3 之后
cli;name;Street;val;gran;dep;dog
1;PauloSmith,SN,1000,M,1,2
2;Mary Smith,SN,2000,S,0,1
您可以使用BeautifulSoup和csv模块:
from bs4 import BeautifulSoup
import csv, sys
data = '''
<POR Cli="1" Name="Paul Smith" Street="SN" >
<Sal Val="1000" Gan="M">
<Fam dep="1" dog="2" />
</Sal>
</POR>
<POR Cli="2" Name="Mary Smith" Street="SN" >
<Sal Val="2000" Gan="S">
<Fam dep="0" dog="1" />
</Sal>
</POR>
'''
soup = BeautifulSoup(data, 'html.parser')
writer = csv.DictWriter(
sys.stdout,
fieldnames=['cli', 'name', 'street', 'val', 'gan', 'dep', 'dog'])
writer.writeheader()
for por in soup.find_all('por'):
d = por.attrs
d.update(por.sal.attrs)
d.update(por.sal.fam.attrs)
writer.writerow(d)