如何遍历此页面上的下拉菜单以网络抓取每个产品的规格和价格?



嗨,我对Python和Web抓取还比较陌生。我正在尝试从该页面下拉菜单中的每个产品选项中抓取数据(https://www.jmesales.com/kuriyama-3-4-in-brass-quick-couplings/)。我相信这个页面不使用JavaScript,我宁愿只使用请求和BeautifulSoup,而不是网络驱动程序。我有代码可以获得每个选项的名称和属性值,但我不确定如何访问与每个选项相关的定价和规格数据。这是我的代码:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
url = 'https://www.jmesales.com/kuriyama-3-4-in-brass-quick-couplings/'
headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.text,'lxml')
options = [item['value'] for item in soup.select('#attribute_select_42800 option')]
for option in options:
print(option)

我想访问每个选项的价格和相关数据。如有任何帮助,我们将不胜感激!

尝试类似的方法:

from bs4 import BeautifulSoup
import requests
url = 'https://www.jmesales.com/kuriyama-3-4-in-brass-quick-couplings/'
s = requests.Session()
headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}
res = s.get(url, headers=headers)
soup = BeautifulSoup(res.text,'lxml')
options = [[item['value'], name.text] for item, name in zip(soup.select('#attribute_select_42800 option'), soup.select('#attribute_select_42800 option'))]

id = soup.select_one('input[name^="product_id"]').get('value')
for option in options[1:]:
item_num, item_name = option
data = {'action': 'add', 'attribute[42800]': item_num, 'product_id': id, 'qty[]': '1'}
product = s.post('https://www.jmesales.com/remote/v1/product-attributes/53564', data=data).json()
price = product['data']['price']['without_tax']['formatted']
print(f'Item name: {item_name} Item price: {price}')

打印:

Item name: Part A Female NPT x Male Adapter Item price: $6.30
Item name: Part B Female Coupler x Male NPT Item price: $13.80
Item name: Part C Female Coupler x Hose Shank Item price: $11.50
Item name: Part D Female Coupler x Female NPT Item price: $12.80
Item name: Part E Male Adapter x Hose Shank Item price: $8.50
Item name: Part F Male NPT x Male Adapter Item price: $7.30
Item name: Dust Cap Item price: $11.00
Item name: Dust Plug Item price: $8.10

上面的代码只从您拥有的特定url中获取示例,这可以解析多个url:

url = 'https://www.jmesales.com/dixon-brass-female-ght-x-female-npt-adapter-lead-free/'
s = requests.Session()
headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}
res = s.get(url, headers=headers)
soup = BeautifulSoup(res.text,'lxml')
attrid = re.findall('[([d]+)]', soup.select_one('.form-select.form-select--small').get('name'))[0]
options = [[item['value'], name.text] for item, name in zip(soup.select(f'#attribute_select_{attrid} option'), soup.select(f'#attribute_select_{attrid} option'))]

id = soup.select_one('input[name^="product_id"]').get('value')
for option in options[1:]:
item_num, item_name = option
data = {'action': 'add', f'attribute[{attrid}]': item_num, 'product_id': id, 'qty[]': '1'}
product = s.post(f'https://www.jmesales.com/remote/v1/product-attributes/{id}', data=data).json()
price = product['data']['price']['without_tax']['formatted']
print(f'Item name: {item_name} Item price: {price}')

希望这将有所帮助:

from pyautogui import typewrite
amount_of_options = 4 # Amount of options in the menu
typewrite(['enter']) # Click on the dropdown menu
for i in range(amount_of_options):
typewrite(['tab']) # Each tab will navigate to the next option in the menu

这不是你一直在寻找的答案,但对于网络抓取,我建议使用硒。

https://selenium-python.readthedocs.io/

这很简单,打开浏览器,你可以做任何你想做的事。我要做的是查找xpath并查找要迭代的模式。

最新更新