在使用python网页抓取时遇到错误

我想比较一下两个网站上椰子的价格。有两个商店(网站)叫笑和glomark。

现在，我有两个文件main.py和comparison.py。我认为问题出在笑声的降价部分。这条线运行正常。我将把我的输出和预期输出放在下面的代码后面。

main.py

from compare_prices import compare_prices 
laughs_coconut = 'https://scrape-sm1.github.io/site1/COCONUT%20market1super.html'
glomark_coconut = 'https://glomark.lk/coconut/p/11624'
compare_prices(laughs_coconut,glomark_coconut)

comparison.py

import requests
import json
from bs4 import BeautifulSoup
#Imitate the Mozilla browser.
user_agent = {'User-agent': 'Mozilla/5.0'}
def compare_prices(laughs_coconut,glomark_coconut):
# Aquire the web pages which contain product Price
laughs_coconut = requests.get(laughs_coconut)
glomark_coconut = requests.get(glomark_coconut)
# LaughsSuper supermarket website provides the price in a span text.
soup_laughs = BeautifulSoup(laughs_coconut.text, 'html.parser')
price_laughs = soup_laughs.find('span',{'class': 'price'}).text


# Glomark supermarket website provides the data in jason format in an inline script.
soup_glomark = BeautifulSoup(glomark_coconut.text, 'html.parser')
script_glomark = soup_glomark.find('script', {'type': 'application/ld+json'}).text
data_glomark = json.loads(script_glomark)
price_glomark = data_glomark['offers'][0]['price']

#TODO: Parse the values as floats, and print them.
price_laughs = price_laughs.replace("Rs.","")
price_laughs = float(price_laughs)
price_glomark = float(price_glomark)
print('Laughs   COCONUT - Item#mr-2058 Rs.: ', price_laughs)
print('Glomark  Coconut Rs.: ', price_glomark)

# Compare the prices and print the result
if price_laughs > price_glomark:
print('Glomark is cheaper Rs.:', price_laughs - price_glomark)
elif price_laughs < price_glomark:
print('Laughs is cheaper Rs.:', price_glomark - price_laughs)    
else:
print('Price is the same')

我的代码运行没有错误，作为输出，它显示:

Laughs   COCONUT - Item#mr-2058 Rs.:  0.0
Glomark  Coconut Rs.:  110.0
Laughs is cheaper Rs.: 110.0

但是期望的输出是:

Laughs   COCONUT - Item#mr-2058 Rs.:  95.0
Glomark  Coconut Rs.:  110.0
Laughs is cheaper Rs.: 15.0

注:-<span class="price">Rs.95.00</span>这是笑椰子价格元素

因为'span',{'class': 'price'}有两个项目。由于find()方法返回第一个值，在本例中我们将使用findAll()方法并返回第二个值。所以在你的代码中，如果你改变这个price_laughs = soup_laughs.findAll('span',{'class': 'price'})[1].text问题将得到解决。

尝试改变选择元素的策略-有一个id来选择更具体的元素容器。例如，您可以使用css selectors

price_laughs = soup.select_one('[id^="product-price"] .price').text

关于其他网站，你也可以使用它的api来获取价格:

requests.get('https://glomark.lk/product-page/variation-detail/11624', headers={'x-requested-with': 'XMLHttpRequest'}).json()['price']

相关内容

最新更新

热门标签：