在html文本中查找产品名称



我正在尝试创建一个网站:www.gall.nl,以便创建一个该平台上销售的所有葡萄酒的数据库。我有以下代码:

import requests
from bs4 import BeautifulSoup
URL = 'https://www.gall.nl/wijn/'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
mydivs = soup.find_all("div", {"class": "c-product-tile"})    
print(len(mydivs))
first_wijn = mydivs[0]
print(first_wijn)
result = first_wijn.find()

因此,这提供了12个结果,这是正确的。

打印第一个结果提供以下内容:

<div class="c-product-tile" data-product='{"name":"Faustino V Rioja Reserva","id":"143561","currencyCode":"EUR","price":13.99,"discount":0,"brand":"Faustino","category":"Wijn","variant":"75CL","list":"productoverzicht","position":1,"dimension13":"2","dimension37":"Ja"}' itemprop="item" itemscope="" itemtype="https://schema.org/Product" js-hook-product-tile="">
<meta content="143561" itemprop="sku">
<meta content="8410441412065" itemprop="gtin8">
<meta content="Faustino" itemprop="brand">
<div class="product-tile__header">
<div class="product-tile__category-label">
<div class="m-product-taste-tooltip">
<span aria-label="Classic Red" class="a-tooltip-trigger" data-content="Stevig &amp; Ferm" data-placement="bottom-start" js-hook-tooltip="">
<div class="tooltip-trigger__icon product-taste-tooltip__icon u-taste-profile-icon classic-red-red 
....
<input class="add-to-cart-url" type="hidden" value="/on/demandware.store/Sites-gall-nl-Site/nl_NL/Cart-AddProduct"/>
</div>
</meta></meta></meta></div>

我有兴趣从第一行获取数据:

<div class="c-product-tile" data-product='{"name":"Faustino V Rioja Reserva","id":"143561","currencyCode":"EUR","price":13.99,"discount":0,"brand":"Faustino","category":"Wijn","variant":"75CL","list":"productoverzicht","position":1,"dimension13":"2","dimension37":"Ja"}' itemprop="item" itemscope="" itemtype="https://schema.org/Product" js-hook-product-tile="">

为了得到名称、价格和品牌。

有人能帮我检索这些数据吗?

使用beautifulsoup的.attrs.getdiv中获取data-product
然后,转换为JSON以读取所需的值。

import json
import requests
from bs4 import BeautifulSoup
URL = 'https://www.gall.nl/wijn/'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
# Get all products
mydivs = soup.find_all("div", {"class": "c-product-tile"})
# Loop through each product
for div in mydivs:
# Get data-product
product = div.attrs.get("data-product", None)
# Convert string to json
jsonProduct = json.loads(product.encode('utf-8').decode('ascii', 'ignore'))
# Show name - brand - price
print('{0:<40} {1:<20} {2:>10}'.format(
jsonProduct['name'],
jsonProduct['brand'],
jsonProduct['price']
))

使用format()创建3列,上面的代码产生以下输出:

Faustino V Rioja Reserva                 Faustino                  13.99
Mucho Ms Tinto                           Mucho Mas                  5.99
Cantina di Verona Valpolicella Ripasso   Terre Di Verona           11.99
Villa Jeantel                            Villa Jeantel              8.99
Ondarre Rioja Reserva                    Ondarre                   13.59
Valdivieso Chardonnay                    Valdivieso                 5.99
Domaine Lamourie Ros                     Domaine Lamourie           7.99
Oveja Negra Chardonnay Viognier          Oveja Negra                6.59
La Palma Merlot                          La Palma                   6.59
Alamos Chardonnay                        Alamos                     8.99
Les Hautes Pentes ros                    Les Hautes Pentes          7.99
Piccini Memoro Rosso                     Piccini                    7.29

最新更新