获取<span>属性值



我有一大块html代码我想提取span的一个属性"data-content"的每个值

import requests
from bs4 import BeautifulSoup
with open("C:\Users\stasiek\Desktop\Atom-PYTHON\Python-Udemy\web-scraping\strona.html") as raw_resuls:
results = BeautifulSoup(raw_resuls, "html.parser")
for element in results.find_all("span"):
print(element['data-content'])

这段代码只返回第一个"data-content"在这个文件中有(只有一个单词),然后抛出一个错误:

File "niemiecki.py", line 10, in <module>
print(element['data-content'])
File "C:UsersstasiekAppDataLocalPackagesPythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0LocalCachelocal-packagesPython37site-packagesbs4element.py", line 1406, in __getitem__
return self.attrs[key]
KeyError: 'data-content'

你知道我做错了什么吗?

只选择具有上述属性的,例如

from bs4 import BeautifulSoup
from io import BytesIO
data = b'''
<body>
<span data-content="foo">1</span>
<span>2</span>
<span data-content="bar">3</span>
<span>4</span>
<span>5</span>
</body>
'''
f = BytesIO(data)
soup = BeautifulSoup(f, 'html.parser')
for span in soup.select('span[data-content]'):
print(span['data-content'])

最新更新