我希望在我的结果中带有 2 个不同类的div 标签。 我正在使用以下命令来抓取数据 - '''
result = soup.select('div', {'class' : ['col-s-12', 'search-page-text clearfix row'] })
'''
现在,我在类"col-s-12"中有一组特定的信息,在类"搜索页面文本清除修复行"中有另一组信息 现在,我想找到只有div 标签的孩子与类 - 'col-s-12'。当我在代码下面运行时,它会查找两个div 标签的子级,因为我没有在任何地方指定我要搜索的类
'''
for div in result:
print(div)
prod_name = div.find("a" , recursive=False)[0] #should come from 'col-s-12' only
prod_info = div.find("a" , recursive=False)[0] # should come from 'search-page-text clearfix row' only
'''
例- '''
<div class = 'col-s-12'>
<a href = "some_link"> This is what I want or variable **prod_name** </a>
</div>
<div class = 'search-page-text clearfix row'>
<a> This should be stored in variable **prod_info** </a>
</div>
'''
您可以在带有class="col-s-12"
的标签下搜索第一个<a>
标签,然后使用.find_next('a')
搜索下一个<a>
标签。
注意:.select()
方法只接受CSS选择器,不接受字典。
例如:
txt = '''<div class = 'col-s-12'>
<a href = "some_link"> This is what I want or variable **prod_name** </a>
</div>
<div class = 'search-page-text clearfix row'>
<a> This should be stored in variable **prod_info** </a>
</div>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(txt, 'html.parser')
prod_name = soup.select_one('.col-s-12 > a')
prod_info = prod_name.find_next('a')
print(prod_name.get_text(strip=True))
print(prod_info.get_text(strip=True))
指纹:
This is what I want or variable **prod_name**
This should be stored in variable **prod_info**