我想找到所有属性值等于"ATTR1";以及";ATTR2";而不知道相应的属性名称。
假设我有以下内容:
page_content = '''<a href="ATTR1">text1</a>
<div class="random_value" type="ATTR2">text2</div>
<script class="ATTR1" id="ATTR2">text3</script>
<span class="random_value" id="ATTR2">text5</span>'''
我想要一个只检索第三个元素的脚本,该元素的属性等于"0";ATTR1";AND等于"0"的属性;ATTR2";。也就是说,我需要以下内容:
<script class="ATTR1" id="ATTR2">text3</script>
我知道我可以将函数作为参数传递给find_all()
。但是,如果满足这些条件,我需要帮助理解如何编写返回true的函数。
知道属性名称后,只需将您的条件与css selector
:链接即可
select('#ATTR2.ATTR1')
或者在不知道属性的情况下,只根据检查所有值
for e in soup():
attr_list = [v for i in list(e.attrs.values()) for v in (i if isinstance(i,list) else [i])]
if all(x in attr_list for x in ['ATTR1','ATTR2']):
print(e)
示例
from bs4 import BeautifulSoup
html = '''
<a href="ATTR1">text1</a>
<div class="random_value" type="ATTR2">text2</div>
<script class="ATTR1" id="ATTR2">text3</script>
<span class="random_value" id="ATTR2">text5</div>'''
soup = BeautifulSoup(html)
print(soup.select('#ATTR2.ATTR1'))
for e in soup():
attr_list = [v for i in list(e.attrs.values()) for v in (i if isinstance(i,list) else [i])]
if all(x in attr_list for x in ['ATTR1','ATTR2']):
print(e)
输出
[<script class="ATTR1" id="ATTR2">text3</script>]
[<script class="ATTR1" id="ATTR2">text3</script>]