如何使用BeautifulSoup从div下的多个相同类中抓取数据



如何从具有相同多个类的div中获取数据。我想从featureNamefeatureValue类中提取所有数据,并将它们存储在单独的变量中。对于Examp

a=<div class="featureName">Net Contents / UOM</div>b=<div class="featureName">Country of origin Statement</div>c=<div class="featureName">Manufacturer Name</div>

等等

<div class="productFeatureClasses productDetailsTab">
<div class="feature">
<div class="featureName">Net Contents / UOM</div>
<div class="featureValue">100.0 EA</div>
</div>
<div class="feature">
<div class="featureName">Country of origin Statement</div>
<div class="featureValue">Manufactured in New Zealand</div>
</div>
<div class="feature">
<div class="featureName">Manufacturer Name</div>
<div class="featureValue">Asaleo Care Australia Pty Ltd; Asaleo Care New Zealand Limited</div>
</div>
<div class="feature">
<div class="featureName">Manufacturer Address</div>
<div class="featureValue">30-32 Westall Rd, Springvale, Victoria 3171, Australia; Level 2, 103 Carlton Gore Road, Newmarket, Auckland, 1023, New Zealand</div>
</div>
<div class="feature">
<div class="featureName">Manufacturer Website</div>
<div class="featureValue">www.tork.com.au; www.tork.co.nz</div>
</div>
<div class="feature">
<div class="featureName">Marketing Message</div>
<div class="featureValue">100 sheets. 30 x 30cm. 2 ply. 1/4 folding</div>
</div>
<div class="feature">
<div class="featureName">Packaging Material</div>
<div class="featureValue">Plastic</div>
</div>
<div class="feature">
<div class="featureName">GM Ingredients Used</div>
<div class="featureValue">Unspecified</div>
</div>

因此,当您使用beautifulsoup按类进行查询时,它将返回一个列表,然后您可以轻松访问它。

from BeautifulSoup import BeautifulSoup
html = """
<div class="productFeatureClasses productDetailsTab">
<div class="feature">
<div class="featureName">Net Contents / UOM</div>
<div class="featureValue">100.0 EA</div>
</div>
<div class="feature">
<div class="featureName">Country of origin Statement</div>
<div class="featureValue">Manufactured in New Zealand</div>
</div>
<div class="feature">
<div class="featureName">Manufacturer Name</div>
<div class="featureValue">Asaleo Care Australia Pty Ltd; Asaleo Care New Zealand Limited</div>
</div>
<div class="feature">
<div class="featureName">Manufacturer Address</div>
<div class="featureValue">30-32 Westall Rd, Springvale, Victoria 3171, Australia; Level 2, 103 Carlton Gore Road, Newmarket, Auckland, 1023, New Zealand</div>
</div>
"""
soup = BeautifulSoup(html)
mydivs = soup.findAll("div", {"class": "feature"}) #returns a list
a = mydivs[0]
b = mydivs[1]
#or you can assign them to your variables however you want.
soup = BeautifulSoup(html,"html.parser")
product_feature_classes = soup.find("div", class_='productFeatureClasses productDetailsTab')
net_contents  = product_feature_classes.find('div',string='Net Contents / UOM').parent.find(class_='featureValue').text
country_of_origin_statement = product_feature_classes.find('div',string='Country of origin Statement').parent.find(class_='featureValue').text
# ...................
packaging_material = product_feature_classes.find('div',string='Packaging Material').parent.find(class_='featureValue').text
gm_ingredients_used = product_feature_classes.find('div',string='GM Ingredients Used').parent.find(class_='featureValue').text
print(f'Net Contents / UOM: {net_contents}')
print(f'Country of origin Statement: {country_of_origin_statement}')
print(f'Packaging Material: {packaging_material}')
print(f'GM Ingredients Used: {gm_ingredients_used}')

输出:

Net Contents / UOM: 100.0 EA
Country of origin Statement: Manufactured in New Zealand
Packaging Material: Plastic
GM Ingredients Used: Unspecified

相关内容

最新更新