我想比较一下两个网站上椰子的价格。有两个商店(网站)叫笑和glomark。
现在,我有两个文件main.py
和comparison.py
。我认为问题出在笑声的降价部分。这条线运行正常。我将把我的输出和预期输出放在下面的代码后面。
main.py
from compare_prices import compare_prices
laughs_coconut = 'https://scrape-sm1.github.io/site1/COCONUT%20market1super.html'
glomark_coconut = 'https://glomark.lk/coconut/p/11624'
compare_prices(laughs_coconut,glomark_coconut)
comparison.py
import requests
import json
from bs4 import BeautifulSoup
#Imitate the Mozilla browser.
user_agent = {'User-agent': 'Mozilla/5.0'}
def compare_prices(laughs_coconut,glomark_coconut):
# Aquire the web pages which contain product Price
laughs_coconut = requests.get(laughs_coconut)
glomark_coconut = requests.get(glomark_coconut)
# LaughsSuper supermarket website provides the price in a span text.
soup_laughs = BeautifulSoup(laughs_coconut.text, 'html.parser')
price_laughs = soup_laughs.find('span',{'class': 'price'}).text
# Glomark supermarket website provides the data in jason format in an inline script.
soup_glomark = BeautifulSoup(glomark_coconut.text, 'html.parser')
script_glomark = soup_glomark.find('script', {'type': 'application/ld+json'}).text
data_glomark = json.loads(script_glomark)
price_glomark = data_glomark['offers'][0]['price']
#TODO: Parse the values as floats, and print them.
price_laughs = price_laughs.replace("Rs.","")
price_laughs = float(price_laughs)
price_glomark = float(price_glomark)
print('Laughs COCONUT - Item#mr-2058 Rs.: ', price_laughs)
print('Glomark Coconut Rs.: ', price_glomark)
# Compare the prices and print the result
if price_laughs > price_glomark:
print('Glomark is cheaper Rs.:', price_laughs - price_glomark)
elif price_laughs < price_glomark:
print('Laughs is cheaper Rs.:', price_glomark - price_laughs)
else:
print('Price is the same')
我的代码运行没有错误,作为输出,它显示:
Laughs COCONUT - Item#mr-2058 Rs.: 0.0
Glomark Coconut Rs.: 110.0
Laughs is cheaper Rs.: 110.0
但是期望的输出是:
Laughs COCONUT - Item#mr-2058 Rs.: 95.0
Glomark Coconut Rs.: 110.0
Laughs is cheaper Rs.: 15.0
注:-<span class="price">Rs.95.00</span>
这是笑椰子价格元素
因为'span',{'class': 'price'}
有两个项目。由于find()方法返回第一个值,在本例中我们将使用findAll()方法并返回第二个值。所以在你的代码中,如果你改变这个price_laughs = soup_laughs.findAll('span',{'class': 'price'})[1].text
问题将得到解决。
尝试改变选择元素的策略-有一个id
来选择更具体的元素容器。例如,您可以使用css selectors
price_laughs = soup.select_one('[id^="product-price"] .price').text
关于其他网站,你也可以使用它的api来获取价格:
requests.get('https://glomark.lk/product-page/variation-detail/11624', headers={'x-requested-with': 'XMLHttpRequest'}).json()['price']