Python BeautifulSoup select div class



我正在抓取一个类似的HTML

<div class="col-16 text"> ... </div>
<div class="col-16 image"> ... </div>
<div class="col-16"> ... </div>
<div class="col-16 text"> ... </div>
<div class="col-16 image"> ... </div>
<div class="col-16"> ... </div>

我正在使用这个

soup_object.find_all("div", {"class": "col-xs-12"})

我只想要一个带有类"的div;col-16";,但它正在返回所有div。

如何只选择一个div类;col-16";?

编辑

我想要这个

<div class="col-16"> ... </div>
<div class="col-16"> ... </div>

但我得到了这个

<div class="col-16 text"> ... </div>
<div class="col-16 image"> ... </div>
<div class="col-16"> ... </div>
<div class="col-16 text"> ... </div>
<div class="col-16 image"> ... </div>
<div class="col-16"> ... </div>

只需按类属性的数量过滤div。

例如:

from bs4 import BeautifulSoup
if __name__ == '__main__':
sample_html = """<div class="col-16 text"> ... </div>
<div class="col-16 image"> ... </div>
<div class="col-16"> ... </div>
<div class="col-16 text"> ... </div>
<div class="col-16 image"> ... </div>
<div class="col-16"> ... </div>"""
soup = BeautifulSoup(sample_html, "html.parser").find_all("div")
filtered = [div for div in soup if len(div.attrs["class"]) == 1]
print(filtered)

输出:

[<div class="col-16"> ... </div>, <div class="col-16"> ... </div>]

我认为这些会有所帮助:

BeautifulSoup webcrapping find_all((:查找完全匹配的

https://medium.com/@epicshane/using-beutifulsoup4 to find-class-excact-match-3e263a95e330

我尝试了以下解决方案:https://stackoverflow.com/a/22735249/13548379

from bs4 import BeautifulSoup
html_doc = """<div class="col-16 text"> ... </div>
<div class="col-16 image"> ... </div>
<div class="col-16"> ... </div>
<div class="col-16 text"> ... </div>
<div class="col-16 image"> ... </div>
<div class="col-16"> ... </div>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
#print(soup.prettify())
item = soup.find_all(lambda tag: tag.name == 'div' and 
tag.get('class') == ['col-16'])
for x in item:
print(x.prettify())

结果是:

<div class="col-16">
...
</div>
<div class="col-16">
...
</div>

最新更新