我正在尝试运行一个Beautiful Soup演示,从易趣上刮取价格,价格都是美元,但由于某种原因,当我刮取价格时,它会自动将其转换为新台币。不知道发生了什么。我试着去英国网站,它打印出了正确的货币。我尝试了不同的链接,这些链接指向同一个网站,但有美国易趣ID,但仍然没有区别。
page = requests.get('https://www.ebay.com/sch/i.html?_from=R40&_nkw=dodge+viper&_sacat=0&_sop=20')
soup = bs(page.content)
prices = soup.find_all('span', class_='s-item__price')
在此处输入图像描述
我想明白了。与谷歌Colab以及它从易趣上获取信息的方式有关。我在本地机器上运行了Jupyter Notebook上的代码,它运行得很好。
您只能通过将易趣域名更改为其他域名来更改价格,您还可以同时从多个域名获得价格:
# united states, hong kong, spain
domains = ["ebay.com", "ebay.com.hk", "ebay.es"]
for domain in domains:
page = requests.get(f"https://www.{domain}/sch/i.html", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(page.text, 'lxml')
检查在线IDE中的完整代码。
from bs4 import BeautifulSoup
import requests, lxml
import json
# https://requests.readthedocs.io/en/latest/user/quickstart/#custom-headers
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
}
params = {
'_nkw': 'dodge viper', # search query
}
domains = ["ebay.com", "ebay.com.hk", "ebay.es"]
data_price = []
for domain in domains:
page = requests.get(f"https://www.{domain}/sch/i.html", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(page.text, 'lxml')
for products in soup.select(".s-item__pl-on-bottom"):
data_price.append({"price": products.select_one(".s-item__price").text, "domain": domain})
print(json.dumps(data_price, indent=2, ensure_ascii=False))
示例输出:
[
{
"price": "$109,989.00",
"domain": "ebay.com"
},
{
"price": "HK$ 3,139.79",
"domain": "ebay.com.hk"
},
{
"price": "0,93 EUR",
"domain": "ebay.es"
},
other results ...
]
作为替代方案,您可以使用SerpApi的Ebay Organic Results API。这是一个付费的API,有一个免费的计划,可以在后台处理块和解析。
示例代码:
from serpapi import EbaySearch
import json
# https://serpapi.com/ebay-domains
domains = ["ebay.com", "ebay.es", "ebay.com.hk"]
for domain in domains:
params = {
"api_key": "...", # serpapi key, https://serpapi.com/manage-api-key
"engine": "ebay", # search engine
"ebay_domain": domain, # ebay domain
"_nkw": "dodge viper", # search query
}
search = EbaySearch(params) # where data extraction happens
data = []
results = search.get_dict() # JSON -> Python dict
for organic_result in results.get("organic_results", []):
title = organic_result.get("title")
price = organic_result.get("price")
data.append({
"title" : title,
"price" : price,
"domain": domain
})
print(json.dumps(data, indent=2, ensure_ascii=False))
输出:
[
{
"title": "Dodge Viper Valve Cover Gen 4 Driver side Gen V",
"price": {
"raw": "HK$ 2,315.60",
"extracted": 2315.6
},
"domain": "ebay.com.hk"
},
{
"title": "2M Borde de puerta de automóvil viaje al clima Sellado Pilar B Tira de protección contra el ruido a prueba de viento (Compatible con: Dodge Viper)",
"price": {
"raw": "26,02 EUR",
"extracted": 26.02
},
"domain": "ebay.es"
},
other results ...
]
如果你想了解更多关于网站抓取的信息,有13种方法可以从任何网站博客文章中抓取任何公共数据。