JSONDecodeError使用Python3.9和BeautifulSoup 4进行网络抓取



我正试图获得某个品牌的一些TrustPilot评论-这是我的代码:

import requests
from bs4 import BeautifulSoup
import time
import json
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0"}
#def get_total_items(url):
#soup = BeautifulSoup(requests.get(url, format(0),headers).text, 'lxml')
stars = []
dates = []
results = []
with requests.Session() as s:
for num in range(1,2):
url = "https://www.trustpilot.com/review/www.hiwaldo.com?page={}".format(num)
r = s.get(url, headers = headers)
soup = BeautifulSoup(r.content, 'lxml')
for star in soup.find_all("section", {"class":"review__content"}):
# Get rating value
rating = star.find("div", {"class":"star-rating star-rating--medium"}).find('img').get('alt')
# Get date value
date_json = json.loads(star.find('script').text)
date = date_json['publishedDate']
stars.append(rating)
dates.append(date)
data = {"Rating": rating, "Date": date}
results.append(data)
time.sleep(2)

print(results)

当我运行python3 ~/Desktop/reviews.py时,我遇到以下错误消息:

Traceback (most recent call last):
File "/Users/user/Desktop/reviews.py", line 25, in <module>
date_json = json.loads(star.find('script').text)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

这种设置有什么明显的问题吗?我是一个完全的蟒蛇新手,以防这一点也不明显。

非常感谢!

要从star提取JSON数据,请使用.string方法而不是.text

所以不是:

date_json = json.loads(star.find('script').text)

用途:

date_json = json.loads(star.find('script').string)

最新更新