<h2>Summary</h2>
<p>This is summary one.</p>
<p>contains details of summary1.</p>
<h2>Software/OS</h2>
<p>windows xp</p>
<h2>HARDWARE</h2>
<p>Intel core i5</p>
<p>8 GB RAM</p>
我想从上面创建一个字典,其中keys=header标记和value=paragraph标记。
我想要这种格式的输出
{"摘要":["这是摘要一。","包含摘要1的细节。"],"软件/OS":"windows xp";,"硬件":["英特尔酷睿i5","8GB RAM"]}
有人能帮我吗。提前谢谢。
您可以使用此脚本制作一个字典,其中键是来自<h2>
的文本,值是<p>
文本的列表:
from bs4 import BeautifulSoup
txt = '''<h2>Summary</h2>
<p>This is summary one.</p>
<p>contains details of summary1.</p>
<h2>Software/OS</h2>
<p>windows xp</p>
<h2>HARDWARE</h2>
<p>Intel core i5</p>
<p>8 GB RAM</p>'''
soup = BeautifulSoup(txt, 'html.parser')
out = {}
for p in soup.select('p'):
out.setdefault(p.find_previous('h2').text, []).append(p.text)
print(out)
打印:
{'Summary': ['This is summary one.', 'contains details of summary1.'], 'Software/OS': ['windows xp'], 'HARDWARE': ['Intel core i5', '8 GB RAM']}
如果你不想有长度==1的列表,你可以额外做:
for k in out:
if len(out[k]) == 1:
out[k] = out[k][0]
print(out)
打印:
{'Summary': ['This is summary one.', 'contains details of summary1.'], 'Software/OS': 'windows xp', 'HARDWARE': ['Intel core i5', '8 GB RAM']}