Python - 网页抓取:当我获取多个类的 div 标签数据时,如何访问 1 个类的 div 标签



我希望在我的结果中带有 2 个不同类的div 标签。 我正在使用以下命令来抓取数据 - '''

result = soup.select('div', {'class' : ['col-s-12', 'search-page-text clearfix row'] })

'''

现在,我在类"col-s-12"中有一组特定的信息,在类"搜索页面文本清除修复行"中有另一组信息 现在,我想找到只有div 标签的孩子与类 - 'col-s-12'。当我在代码下面运行时,它会查找两个div 标签的子级,因为我没有在任何地方指定我要搜索的类

'''

for div in result:


print(div)
prod_name = div.find("a" , recursive=False)[0]    #should come from 'col-s-12' only
prod_info  = div.find("a" , recursive=False)[0]   # should come from 'search-page-text clearfix row' only

'''

例- '''

<div class = 'col-s-12'>
<a href = "some_link"> This is what I want or variable **prod_name** </a>
</div>
<div class = 'search-page-text clearfix row'> 
<a>   This should be stored in variable **prod_info**     </a>
</div>

'''

您可以在带有class="col-s-12"的标签下搜索第一个<a>标签,然后使用.find_next('a')搜索下一个<a>标签。

注意:.select()方法只接受CSS选择器,不接受字典。

例如:

txt = '''<div class = 'col-s-12'>
<a href = "some_link"> This is what I want or variable **prod_name** </a>
</div>
<div class = 'search-page-text clearfix row'>
<a>   This should be stored in variable **prod_info**     </a>
</div>'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(txt, 'html.parser')
prod_name = soup.select_one('.col-s-12 > a')
prod_info = prod_name.find_next('a')
print(prod_name.get_text(strip=True))
print(prod_info.get_text(strip=True))

指纹:

This is what I want or variable **prod_name**
This should be stored in variable **prod_info**

最新更新