零碎的请求没有通过



我不知道如何准确地阐述这个问题。我是网络抓取的初学者,我正在尝试使用Python Scrapy抓取网站。该网站是动态的,使用javascript,无法使用基本级别的xpath和CSS选择器检索任何数据。

我试图通过我的蜘蛛来模仿API请求,请求在json对象中有数据的url。该请求url正在抛出HTTP状态代码"未处理或不允许"错误。我想我打错了网址。9/10次这种直接调用json对象url的方法对我有效。我能做些什么不同的事情?url在headers部分有参数和表单数据项,而且url看起来甚至不像一个有效的网站url它开始于https://ih3kc909gb-dsn.algolia.net/1/indexes....我知道这是一个很长的问题,但我真的需要一些帮助,如何得到回应?

您应该使用start_requests()方法而不是start_urls属性。你可以在这里阅读更多关于它的信息。现在,您所需要做的就是发出POST请求。

代码

import scrapy
class carswitch(scrapy.Spider):
name = 'car'
headers = {
"Connection": "keep-alive",
"Pragma": "no-cache",
"Cache-Control": "no-cache",
"sec-ch-ua": "" Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"",
"accept": "application/json",
"sec-ch-ua-mobile": "?0",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
"content-type": "application/x-www-form-urlencoded",
"Origin": "https://carswitch.com",
"Sec-Fetch-Site": "cross-site",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Dest": "empty",
"Referer": "https://carswitch.com/",
"Accept-Language": "en-US,en;q=0.9"
}
body = '{"params":"query=&hitsPerPage=24&page=0&numericFilters=%5B%22country_id%3D1%22%2C%22used_car%20%3D%201%22%5D&facetFilters=&typoTolerance=&tagFilters=%5B%5D&attributesToHighlight=%5B%5D&attributesToRetrieve=%5B%22make%22%2C%22make_ar%22%2C%22model%22%2C%22model_ar%22%2C%22year%22%2C%22trim%22%2C%22displayTrim%22%2C%22colorPaint%22%2C%22bodyType%22%2C%22salePrice%22%2C%22transmissionType%22%2C%22GPS%22%2C%22carID%22%2C%22inspectionID%22%2C%22inspectionStatus%22%2C%22rate%22%2C%22certified_dealer_id%22%2C%22dealer_category%22%2C%22used_car%22%2C%22new%22%2C%22top_condition%22%2C%22featured%22%2C%22photo%22%2C%22modifiedPlace%22%2C%22city%22%2C%22mileage%22%2C%22urgent_sales%22%2C%22price_dropped%22%2C%22urgent_sales_days%22%2C%22urgent_sales_end_date%22%2C%22date%22%2C%22negotiable%22%2C%22oldPrice%22%2C%22zero_downpayment%22%2C%22cashOnly%22%2C%22hasPriceGuidance%22%2C%22dealerOffer%22%2C%22maxPrice%22%2C%22fairPrice%22%2C%22pricey_deal%22%2C%22fair_deal%22%2C%22good_deal%22%2C%22great_deal%22%2C%22dealership_info%22%2C%22logo_small%22%2C%22GCCspecs%22%2C%22country%22%2C%22export%22%2C%22monthly_price%22%5D"}'
def start_requests(self):
url = 'https://ih3kc909gb-dsn.algolia.net/1/indexes/All_Carswitch_Cars/query?x-algolia-agent=Algolia%20for%20JavaScript%20(3.33.0)%3B%20Browser&x-algolia-application-id=IH3KC909GB&x-algolia-api-key=493a9bbc57331df3b278fa39c1dd8f2d'    
yield Request(url=url, method='POST', headers=self.headers, body=self.body, callback=self.parse)

def parse(self,response):
print(response.body)

最新更新