字节类型序列化的python-json问题

我正在按照一个教程从静态网站构建一个简单的webscraper，但我得到了以下TypeError：TypeError(f'类型为｛o.类.名称｝的对象'TypeError:字节类型的对象不是JSON可序列化的

以下是我迄今为止的代码：来自bs4进口BeautifulSoup导入请求导入json

url = 'http://ethans_fake_twitter_site.surge.sh/'
response = requests.get(url, timeout=5)
content = BeautifulSoup(response.content, "html.parser")
tweetArr = []
for tweet in content.findAll('div', attrs = {'class': 'tweetcontainer'}):
tweetObject = {
"author": tweet.find('h2', attrs= {'class': 'author'}).text.encode('utf-8'),
"date": tweet.find('h5', attrs= {'class': 'dateTime'}).text.encode('utf-8'),
"content": tweet.find('p', attrs= {'class': 'content'}).text.encode('utf-8'),
"likes": tweet.find('p', attrs= {'class': 'likes'}).text.encode('utf-8'),
"shares": tweet.find('p', attrs= {'class': 'shares'}).text.encode('utf-8')
}
tweetArr.append(tweetObject)
with open('twitterData.json', 'w') as outfile:
json.dump(tweetArr, outfile)

我唯一可以假设的错误是，这篇文章使用的是早期版本的python，但这篇文章是最近的，所以不应该是这样。代码正在执行并创建json文件，但上面的唯一数据是"author:"。如果答案对你们中的一些人来说是显而易见的，我很抱歉，但我才刚刚开始学习。

以下是整个错误日志：(tutorial env(C:\Users\afaal\Desktop\python\webscraper>python-webscraper.py追踪(最近一次通话(：文件"webscraper.py"，第20行，位于json.dump(tweetArr，outfile(文件"C:\Users\afaal\AppData\Local\Programs\Python38\lib\json_init__.py"，第179行，转储对于iterable中的chunk：文件"C:\Users\afaal\AppData\Local\Programs\Python38\lib\json\encoder.py"，第429行，在_iterencode中_iterencode_list的收益率(o，_current_intent_level(文件"C:\Users\afaal\AppData\Local\Programs\Python38\lib\json\encoder.py"，第325行，在_iterencode_list中大块产量文件"C:\Users\afaal\AppData\Local\Programs\Python38\lib\json\encoder.py"，第405行，在_iterencode_dict中大块产量_iterencode中的文件"C:\Users\afaal\AppData\Local\Programs\Python38\lib\json\encoder.py"，第438行o=_default(o(默认情况下，文件"C:\Users\afaal\AppData\Local\Programs\Python38\lib\json\encoder.py"，第179行raise TypeError(f'类型为｛o.类.名称｝的对象'TypeError:字节类型的对象不是JSON可序列化的

好的，所以我需要删除".text"之后的所有内容，并且只搜索"Json serialization"(我只尝试搜索我的特定TypeError，没有得到任何结论性信息(。正确的代码如下，以防像我这样的业余爱好者遇到同样的问题：

url = 'http://ethans_fake_twitter_site.surge.sh/'
response = requests.get(url, timeout=5)
content = BeautifulSoup(response.content, "html.parser")
tweetArr = []
for tweet in content.findAll('div', attrs = {'class': 'tweetcontainer'}):
tweetObject = {
"author": tweet.find('h2', attrs= {'class': 'author'}).text,
"date": tweet.find('h5', attrs= {'class': 'dateTime'}).text,
"content": tweet.find('p', attrs= {'class': 'content'}).text,
"likes": tweet.find('p', attrs= {'class': 'likes'}).text,
"shares": tweet.find('p', attrs= {'class': 'shares'}).text
}
tweetArr.append(tweetObject)
with open('twitterData.json', 'w') as outfile:
json.dump(tweetArr, outfile)

所有信用都归功于@juanpa.arrivilaga，非常感谢您彻底解决了这一问题！

相关内容

最新更新

热门标签：