Beautifulsoup4查找并列出有名字和按名字排列的孩子



我有了这个XML文件mockup:

xml="""
<fruits>
<fruit>
<name>apple</name>
<types>
<type>
<color>red</color>
<taste>sweet</taste>
<size>big</size>
<description>Nice, round, sweet red apple</description>
</type>
<type>
<color>green</color>
<taste>sour</taste>
<size>medium</size>
<description>Small, sour, green apple</description>
</type>
</types>
</fruit>
<fruit>
<name>Banana</name>
<types>
<type>
<color>yellow</color>
<taste>sweet</taste>
<size>small</size>
<description>Good for banana-smoothies only</description>
</type>
<type>
<color>green</color>
<taste>Bitter</taste>
<size>big</size>
<description>Not quite ripe yet</description>
</type>
</types>
</fruit>
</fruits>
"""
#</editor-fold>

我想用这个代码:

from bs4 import BeautifulSoup
soup=BeautifulSoup(xml, 'lxml')
fruits=soup.findAll("fruit", recursive=False)
print(fruits)
type=soup.findAll("type")
list=[]
name=soup.findAll("name")
for nameid in range(len(name)):
list+=name[nameid]
for id in range(len(type)):
list+=(soup.findAll("color")[id].string)
list+=(soup.findAll("taste")[id].string)
list+=(soup.findAll("size")[id].string)
list+=(soup.findAll("description")[id].string)
list+=("""</tr>""")
#list.append("<td>"+soup.findAll("description")[id].string+"</td>")
#list.append("</tr>")
if list:
list="".join(list)

我无法找到一种方法来列出属性(的孩子),在一个表中的名称。到目前为止,我所尝试的所有内容最终都显示了名称,但是当它点击banana时,它要么只显示苹果的属性,要么同时显示苹果和香蕉的属性。

我只是使用python与BeautifulSoup+lxml循环。任何帮助都是感激的!

下面的代码将把xml中的所有信息收集到一个"有意义"的数据结构中。

代码不使用任何外部库-仅使用核心python xml库。

import xml.etree.ElementTree as ET
from collections import defaultdict
xml = """
<fruits>
<fruit>
<name>apple</name>
<types>
<type>
<color>red</color>
<taste>sweet</taste>
<size>big</size>
<description>Nice, round, sweet red apple</description>
</type>
<type>
<color>green</color>
<taste>sour</taste>
<size>medium</size>
<description>Small, sour, green apple</description>
</type>
</types>
</fruit>
<fruit>
<name>Banana</name>
<types>
<type>
<color>yellow</color>
<taste>sweet</taste>
<size>small</size>
<description>Good for banana-smoothies only</description>
</type>
<type>
<color>green</color>
<taste>Bitter</taste>
<size>big</size>
<description>Not quite ripe yet</description>
</type>
</types>
</fruit>
</fruits>
"""
data = defaultdict(list)
root = ET.fromstring(xml)
for fruit in root.findall('.//fruit'):
name = fruit.find('name').text
for _type in fruit.findall('.//type'):
data[name].append({x.tag: x.text for x in list(_type)})
for fruit, types in data.items():
print(f'{fruit} -> {types}')

输出
apple -> [{'color': 'red', 'taste': 'sweet', 'size': 'big', 'description': 'Nice, round, sweet red apple'}, {'color': 'green', 'taste': 'sour', 'size': 'medium', 'description': 'Small, sour, green apple'}]
Banana -> [{'color': 'yellow', 'taste': 'sweet', 'size': 'small', 'description': 'Good for banana-smoothies only'}, {'color': 'green', 'taste': 'Bitter', 'size': 'big', 'description': 'Not quite ripe yet'}]

最新更新