Beautiful Soup Auto将美元兑换成TW



我正在尝试运行一个Beautiful Soup演示,从易趣上刮取价格,价格都是美元,但由于某种原因,当我刮取价格时,它会自动将其转换为新台币。不知道发生了什么。我试着去英国网站,它打印出了正确的货币。我尝试了不同的链接,这些链接指向同一个网站,但有美国易趣ID,但仍然没有区别。

page = requests.get('https://www.ebay.com/sch/i.html?_from=R40&_nkw=dodge+viper&_sacat=0&_sop=20')
soup = bs(page.content)
prices = soup.find_all('span', class_='s-item__price')

在此处输入图像描述

我想明白了。与谷歌Colab以及它从易趣上获取信息的方式有关。我在本地机器上运行了Jupyter Notebook上的代码,它运行得很好。

BeautifulSoup与转换价格无关,因为当您使用CSS选择器提取HTML的某些部分时,它只从HTML中提取价格。

您只能通过将易趣域名更改为其他域名来更改价格,您还可以同时从多个域名获得价格:

# united states, hong kong, spain
domains = ["ebay.com", "ebay.com.hk", "ebay.es"]
for domain in domains:
page = requests.get(f"https://www.{domain}/sch/i.html", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(page.text, 'lxml')

检查在线IDE中的完整代码。

from bs4 import BeautifulSoup
import requests, lxml
import json
# https://requests.readthedocs.io/en/latest/user/quickstart/#custom-headers
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
}

params = {
'_nkw': 'dodge viper',       # search query  
}
domains = ["ebay.com", "ebay.com.hk", "ebay.es"]
data_price = []
for domain in domains:
page = requests.get(f"https://www.{domain}/sch/i.html", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(page.text, 'lxml')

for products in soup.select(".s-item__pl-on-bottom"):
data_price.append({"price": products.select_one(".s-item__price").text, "domain": domain})

print(json.dumps(data_price, indent=2, ensure_ascii=False))

示例输出:

[
{
"price": "$109,989.00",
"domain": "ebay.com"
},
{
"price": "HK$ 3,139.79",
"domain": "ebay.com.hk"
},
{
"price": "0,93 EUR",
"domain": "ebay.es"
},
other results ...
]

作为替代方案,您可以使用SerpApi的Ebay Organic Results API。这是一个付费的API,有一个免费的计划,可以在后台处理块和解析。

示例代码:

from serpapi import EbaySearch
import json
# https://serpapi.com/ebay-domains
domains = ["ebay.com", "ebay.es", "ebay.com.hk"]
for domain in domains:
params = {
"api_key": "...",                 # serpapi key, https://serpapi.com/manage-api-key   
"engine": "ebay",                 # search engine
"ebay_domain": domain,            # ebay domain
"_nkw": "dodge viper",            # search query
}
search = EbaySearch(params)           # where data extraction happens
data = []
results = search.get_dict()     # JSON -> Python dict
for organic_result in results.get("organic_results", []):
title = organic_result.get("title")
price = organic_result.get("price")
data.append({
"title" : title,
"price" : price,
"domain": domain
})

print(json.dumps(data, indent=2, ensure_ascii=False))

输出:

[
{
"title": "Dodge Viper Valve Cover Gen 4 Driver side Gen V",
"price": {
"raw": "HK$ 2,315.60",
"extracted": 2315.6
},
"domain": "ebay.com.hk"
},
{
"title": "2M Borde de puerta de automóvil viaje al clima Sellado Pilar B Tira de protección contra el ruido a prueba de viento (Compatible con: Dodge Viper)",
"price": {
"raw": "26,02 EUR",
"extracted": 26.02
},
"domain": "ebay.es"
},
other results ...
]

如果你想了解更多关于网站抓取的信息,有13种方法可以从任何网站博客文章中抓取任何公共数据。

相关内容

  • 没有找到相关文章

最新更新