我正试图使用漂亮汤和python从网站中提取一些产品和品牌数据。
我已经成功地提取了html,下面是一个例子:
</a></div></div><div class="Product" data-sku="120348"><a class="Product-link thumb "
data-dl-tracked='{"event":"productImpression","ecommerce":{"currencyCode":"GBP","impressions":
[{"name":"Kiehls Powerful Strength Bundle","price":"74.40","brand":"Kiehls","list":"Product Listing -
","position":-1,"id":"120348"}]}}' data-feelunique-datalayer-push='{"click":
{"event":"productClick","ecommerce":{"currencyCode":"GBP","click":{"actionField":
{"list":"Product Listing -"},"products":[{"name":"Kiehls Powerful Strength
Bundle","id":"120348","price":"74.40","brand":"Kiehls","position":-1}]}}}
我正在使用下面的字典列表:
for i in soup.find_all("a" , {"class": "Product-link thumb "}): product.append(i.get("data-dl-tracked"))
但我想成为一个从字典里去掉名字和品牌的人。有什么想法吗?
以下是该网站的链接,以防有帮助:https://www.feelunique.com/skin
您可以使用json
模块来解析HTML属性中存储的数据。例如:
import json
import requests
from bs4 import BeautifulSoup
url = 'https://www.feelunique.com/skin'
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0',
'Accept-Language': 'en-US,en;q=0.5'
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
for a in soup.select('a[data-feelunique-datalayer-push]'):
data = json.loads(a['data-feelunique-datalayer-push'])
# uncomment this to see all data:
# print(json.dumps(data, indent=4))
product = data['click']['ecommerce']['click']['products'][0]
print('{:<30} {}'.format(product['brand'], product['name']))
打印:
Powered by Feelunique Megan Rose Lane Self-Care Kit
Wishful Wishful Honey Balm Moisturiser 55g
Kiehls Kiehls Ultra Facial Cream 50ml
Kiehls Kiehls Ultra Facial Cream 125ml
Charlotte Tilbury Charlotte Tilbury Charlottes Magic Cream Moisturiser 50ml
CeraVe CeraVe Foaming Facial Cleanser 473ml
Wishful Wishful Yo Glow Enzyme Scrub 100ml
Kiehls Kiehls Powerful Strength Bundle
Kiehls Kiehls Midnight Recovery Concentrate Facial Oil 30ml
Estée Lauder Estée Lauder Advanced Night Repair Synchronized Multi-Recovery Complex 30ml
Kiehls Kiehls Rare Earth Deep Pore Cleansing Mask 125ml
Charlotte Tilbury Charlotte Tilbury Morning Magic Skin Kit
Embryolisse Embryolisse Lait-Crème Concentré Moisturiser 75ml
Kiehls Kiehls Ultra Light Daily UV Defense SPF50 60ml
Kiehls Kiehls Creamy Eye Treatment with Avocado 28ml
Clinique Clinique Moisture Surge™ 72-Hour Auto-Replenishing Hydrator 50ml
The Inkey List The INKEY List Tranexamic Acid Night Treatment 30ml
Kiehls Kiehls Super Multi-Corrective Cream SPF 30 50ml
Kiehls Kiehls Creamy Eye Treatment with Avocado 14ml
Dermalogica Dermalogica Daily Microfoliant 75g
Clinique Clinique Take The Day Off Cleansing Balm 125ml
Estée Lauder Estée Lauder Advanced Night Repair Eye Supercharged Complex 15ml
Kiehls Kiehls Powerful-Strength Line-Reducing Concentrate 50ml
Grown Alchemist Grown Alchemist Age-Repair Moisturiser Phyto-Peptide & White Tea Extract 60ml
Charlotte Tilbury Charlotte Tilbury Charlottes Magic Lip Oil Crystal Elixir 8ml
The Inkey List The INKEY List Retinol Eye Cream 15ml
Lumene Lumene Nordic-C Glow Boost Essence Serum 30ml
Kiehls Kiehls Calendula Deep Cleansing Foaming Face Wash 230ml
Kiehls Kiehls Midnight Recovery Concentrate 50ml
Perricone MD Perricone MD Vitamin C Ester Citrus Brightening Cleanser 59ml
Charlotte Tilbury Charlotte Tilbury Charlotte’s Cleanse, Hydrate & Glow Mini Facial Kit
Saturday Skin Saturday Skin No Bad Days Set
First Aid Beauty First Aid Beauty Ultra Repair Hydrating Serum 50ml
Emma Hardie Emma Hardie Moringa Cleansing Balm 200g - 10th Anniversary Edition
DHC DHC Deep Cleansing Oil 200ml
Dermalogica Dermalogica Special Cleansing Gel 500ml
COSRX COSRX Acne Pimple Master Patch 24 Patches
Kiehls Kiehls Ultra Facial Cream SPF30 125ml
Charlotte Tilbury Charlotte Tilbury Charlottes Magic Serum Crystal Elixir 30ml
Kiehls Kiehls Ultra Facial Favourites Gift Set