Scrapy无法抓取API



我正在尝试使用scrapy从这个链接抓取API

事情是API请求,我试图得到解决我的所有问题,但我无法加载json形式的响应,我不能继续。

虽然代码看起来很长,但由于标题和cookie,代码只长,请建议我如何改进和找到解决方案

这是我写的代码

from datetime import datetime
import json
from urllib.parse import urlencode
import scrapy
from bs4 import BeautifulSoup
from liveshare.items import AGMSpiderItems

class SubIndexSpider(scrapy.Spider):
name = "subindexes"
def start_requests(self):
headers = {
'authority': 'merolagani.com',
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-language': 'en-GB,en;q=0.9,en-US;q=0.8,ne;q=0.7,ru;q=0.6',
'cache-control': 'no-cache',
# 'cookie': 'ASP.NET_SessionId=bbjd1loebaad4ha2qwwxdcfp; _ga=GA1.2.810096005.1667463342; _gid=GA1.2.1263273763.1673850832; _gat=1; __atuvc=4%7C3; __atuvs=63c4efd0a14c6c9b003',
'pragma': 'no-cache',
'referer': 'https://merolagani.com/MarketSummary.aspx',
'sec-ch-ua': '"Not?A_Brand";v="8", "Chromium";v="108", "Google Chrome";v="108"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Linux"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
}
params = {
'type': 'market_summary',
}
cookies = {
'ASP.NET_SessionId': 'bbjd1loebaad4ha2qwwxdcfp',
'_ga': 'GA1.2.810096005.1667463342',
'_gid': 'GA1.2.1263273763.1673850832',
'_gat': '1',
'__atuvc': '4%7C3',
'__atuvs': '63c4efd0a14c6c9b003',
}
api_url = f'https://merolagani.com/handlers/webrequesthandler.ashx{urlencode(params)}'
yield scrapy.Request(
url=api_url,
method='GET',
headers=headers,
cookies=cookies,
callback=self.parse,
dont_filter=True
)
def parse(self, response):
print(response.headers)
print(response.body)
json_response = json.loads(response.body)
print(json_response)

但是我得到JSON解码错误,我无法找出问题。

错误回溯
File "C:UsersNavarAppDataLocalProgramsPythonPython39libjsondecoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 3 column 1 (char 4)

我已经使用了我简化的代码,并且没有出现错误,JSON数据成功返回。

代码:

url_api = "https://merolagani.com/handlers/webrequesthandler.ashx?type=market_summary"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36", 
"Referer": "https://merolagani.com"
}
page = requests.get(url_api, headers=headers)
js_data = json.loads(page.text)
print(js_data)

在anotepad.com查看JSON结果

可能错误是在你的代码的响应-即响应不是一个JSON对象。

最新更新