如何选择div类中的一个div与美丽的汤?



我对这个真的很陌生,我很困惑如何选择具有特定htmldiv类图像的信息,但它是 下面是我的代码:

from bs4 import BeautifulSoup
import bs4
import requests
import json
import numpy as np

urls = ['https://filterbuy.com/brand/trion-air-bear-air-filters/20x20x5-air-bear-20x20/?selected_merv=8',
'https://filterbuy.com/brand/trion-air-bear-air-filters/20x20x5-air-bear-20x20/?selected_merv=11',
'https://filterbuy.com/brand/trion-air-bear-air-filters/20x20x5-air-bear-20x20/?selected_merv=13']
#scrape elements
for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
#returned values
product = soup.find("h1", class_="text-center")
merv = url.rsplit('?', 0)
price_table= soup.find("??????")
json_schema = soup.find_all('script', attrs={'type': 'application/ld+json'})[1]
json_file = json.loads(json_schema.get_text())

for product, merv, in zip(product, merv):
print(product.getText(), merv, price_table, json_file)
np.savetxt('products.csv', [p for p in zip(product, json_schema)], delimiter=',', fmt='%s')

问题是这个表不在源代码中,它是使用JS从静态数据形成的。这三个表都直接存在于页面上。您可以访问购物车中的税值类

url = 'https://filterbuy.com/brand/trion-air-bear-air-filters/20x20x5-air-bear-20x20/?selected_merv=8'
response = requests.get(url)
mervs = BeautifulSoup(response.text, 'lxml').find_all('strong')
for i, cart in enumerate(BeautifulSoup(response.text, 'lxml').find_all('form', class_='cart')):
for tax in cart.attrs:
if 'data-price' in tax:
print(mervs[i].get_text(), cart[tax])

输出:

MERV 8 47.58
MERV 8 29.73
MERV 8 28.11
MERV 8 27.56
MERV 8 27.03
MERV 11 57.09
MERV 11 29.13
MERV 11 30.92
MERV 11 30.32
MERV 11 30.02
MERV 11 29.73
MERV 13 62.99
MERV 13 34.11
MERV 13 32.54
MERV 13 32.01
MERV 13 31.49

相关内容

  • 没有找到相关文章

最新更新