JSONDecodeError Issue? python



谁能帮我解决这个问题,我不知道为什么我得到这个错误。

我正在尝试使用别人做的python程序,我试着乱搞它,但我不能找出问题。

错误:

PS D:Python> python .quizlet.py
Traceback (most recent call last):
File "D:Pythonquizlet.py", line 69, in <module>
q = QuizletParser(website)
File "D:Pythonquizlet.py", line 17, in QuizletParser
data = json.loads(BeautifulSoup(session.get(link).content, features="lxml").find_all('script')[-6].string[44:-152])
File "C:UsersjohnAppDataLocalProgramsPythonPython39libjson__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "C:UsersjohnAppDataLocalProgramsPythonPython39libjsondecoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 14 (char 13)

我正试图使用我在这里找到的代码:https://github.com/daijro/python-quizlet

来源:

from requests_html import HTMLSession
from box import Box
import box
import json
from bs4 import BeautifulSoup
from difflib import SequenceMatcher
def FindFlashcard(flashcards: box.box_list.BoxList, match: str):
similar = lambda a, b: SequenceMatcher(None, a, b).ratio()
data = max(list(zip([similar(match, x.term) for x in flashcards], [x for x in range(len(flashcards))])))
flashcard = flashcards[data[1]]
flashcard.update({'similarity': data[0]})
return flashcard
def QuizletParser(link: str):
session = HTMLSession()
data = json.loads(BeautifulSoup(session.get(link).content, features="lxml").find_all('script')[-6].string[44:-152])
flashcards = []
for i in list(data['termIdToTermsMap'].values()):
i = {
'index': i['rank'],
'id': i['id'],
'term': i['word'],
'definition': i['definition'],
'setId': i['setId'],
'image': i['_imageUrl'],
'termTts': 'https://quizlet.com'+i['_wordTtsUrl'],
'termTtsSlow': 'https://quizlet.com'+i['_wordSlowTtsUrl'],
'definitionTts': 'https://quizlet.com'+i['_definitionTtsUrl'],
'definitionTtsSlow': 'https://quizlet.com'+i['_definitionSlowTtsUrl'],
'lastModified': i['lastModified'],
}
flashcards.append(i)
output = {
'title': data['set']['title'],
'flashcards': flashcards,
'author': {
'name': data['creator']['username'],
'id': data['creator']['id'],
'timestamp': data['creator']['timestamp'],
'lastModified': data['creator']['lastModified'],
'image': data['creator']['_imageUrl'],
'timezone': data['creator']['timeZone'],
'isAdmin': data['creator']['isAdmin'],
},
'id': data['set']['id'],
'link': data['set']['_webUrl'],
'thumbnail': data['set']['_thumbnailUrl'],
'timestamp': data['set']['timestamp'],
'lastModified': data['set']['lastModified'],
'publishedTimestamp': data['set']['publishedTimestamp'],
'authorsId': data['set']['creatorId'],
'termLanguage': data['set']['wordLang'],
'definitionLanguage': data['set']['defLang'],
'description': data['set']['description'],
'numTerms': data['set']['numTerms'],
'hasImages': data['set']['hasImages'],
'hasUploadedImage': data['hasUploadedImage'],
'hasDiagrams': data['set']['hasDiagrams'],
'hasImages': data['set']['hasImages'],
}
return Box(output)
website = 'https://quizlet.com/475389316/python-web-scraping-flash-cards/'
text = 'Two popular parsers'
q = QuizletParser(website)
flashcard = FindFlashcard(q.flashcards, match=text)  # finds the flashcard most similar to the input
print(flashcard.term + " " + flashcard.definition)  # calculates how similar the identified flashcard is to the input

不看数据很难给出解决方案。

调试JSON错误的几个技巧:

  1. 检查JSONDecoder的输入数据。您可能会在输入字典的最后一个键值对上添加逗号(这是很常见的)。
  2. 检查数据类型。如果您的输入数据来自外部源,请先检查数据。

我建议把这个打印出来,如果可能的话贴在这里。

input_data = BeautifulSoup(session.get(link).content, features="lxml").find_all('script')[-6].string[44:-152]
print(input_data)

最新更新