如何在<p> <h2> python中使用漂亮的汤过滤标签的特定标签,然后从中构建字典


<h2>Summary</h2>
<p>This is summary one.</p>
<p>contains details of summary1.</p>
<h2>Software/OS</h2>
<p>windows xp</p>
<h2>HARDWARE</h2>
<p>Intel core i5</p>
<p>8 GB RAM</p>

我想从上面创建一个字典,其中keys=header标记和value=paragraph标记。

我想要这种格式的输出

{"摘要":["这是摘要一。","包含摘要1的细节。"],"软件/OS":"windows xp";,"硬件":["英特尔酷睿i5","8GB RAM"]}

有人能帮我吗。提前谢谢。

您可以使用此脚本制作一个字典,其中键是来自<h2>的文本,值是<p>文本的列表:

from bs4 import BeautifulSoup

txt = '''<h2>Summary</h2>
<p>This is summary one.</p>
<p>contains details of summary1.</p>
<h2>Software/OS</h2>
<p>windows xp</p>
<h2>HARDWARE</h2>
<p>Intel core i5</p>
<p>8 GB RAM</p>'''
soup = BeautifulSoup(txt, 'html.parser')
out = {}
for p in soup.select('p'):
out.setdefault(p.find_previous('h2').text, []).append(p.text)
print(out)

打印:

{'Summary': ['This is summary one.', 'contains details of summary1.'], 'Software/OS': ['windows xp'], 'HARDWARE': ['Intel core i5', '8 GB RAM']}

如果你不想有长度==1的列表,你可以额外做:

for k in out:
if len(out[k]) == 1:
out[k] = out[k][0]
print(out)

打印:

{'Summary': ['This is summary one.', 'contains details of summary1.'], 'Software/OS': 'windows xp', 'HARDWARE': ['Intel core i5', '8 GB RAM']}

相关内容

  • 没有找到相关文章

最新更新