在Scrapy框架中发送一个post请求,该请求不起作用,但在python请求中起作用,显示403错误



我面临一个与POST API中的scrapy框架有关的问题。我是通过python请求完成的,但我不理解scratch框架的问题。

网站的Url和POST API Url

我只想在我的系统中清理api数据,这样我就可以访问所有的酒店名称。我认为该网站正在使用一些防刮措施。

蜘蛛:

import scrapy

class MSpider(scrapy.Spider):
name = 'm'
custom_settings = {
'COOKIES_ENABLED': False
}
headers = {
'authority': 'mapi.makemytrip.com',
'pragma': 'no-cache',
'cache-control': 'no-cache',
'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="97", "Chromium";v="97"',
'currency': 'INR',
'language': 'eng',
'server': 'b2c',
'sec-ch-ua-mobile': '?0',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36',
'visitor-id': '02f76bec-722c-4873-8cc0-e0cbf575a7db',
'usr-mcid': '06489206631463030412606870076632113416',
'region': 'in',
'accept': 'application/json',
'content-type': 'application/json',
'os': 'desktop',
'vid': '02f76bec-722c-4873-8cc0-e0cbf575a7db',
'tid': 'avc',
'sec-ch-ua-platform': '"macOS"',
'origin': 'https://www.makemytrip.com',
'sec-fetch-site': 'same-site',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://www.makemytrip.com/',
'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
'cookie': 'ccde=IN; dvid=93d81c50-7cc6-44db-8b88-ddb922b59cd1; _gcl_au=1.1.1495556954.1640321393; s_ecid=MCMID%7C06489206631463030412606870076632113416; __gads=ID=0913edca5112d1d9-22f6e64d85cf0031:T=1640321395:RT=1640321395:S=ALNI_Mb3OxDBV_sTh1OnsNOF5fiULoCRiA; mcid=02f76bec-722c-4873-8cc0-e0cbf575a7db; AMCVS_1E0D22CE527845790A490D4D%40AdobeOrg=1; _fbp=fb.1.1641210639410.208887363; lang=eng; ver=pwa_v3; bm_sz=7FCEAFEAE3ACFC31E355EA3647A40A5F~YAAQNtjIFxXnx65+AQAAoi55rw74DINmrCA7C1UtVsw5cbcwDFwsdmEWxQTOYAPWavkAjWi1Qlylr86iiWLuj/xpzPBQEtQKtIBJ+dXeJpEyp/1Cz84POhZgUXoDZhZumAtqeX3QGC4SlxvS3/UW7kLFytCGMWCrLQUw0sh8K14ZV7XD/YUMyVgxGoMBwDWjm8vUK4TTPt3vBsYzL97TmX7MsQRENizX3dHzsHecaHnJTsnWd3b8PKiBAF2PEOTRwYgfYOSa/qRUtrZ/bOR1blnNqGvM85Hz7pletKYErYcLb5iup3DU0McXWymtGsQu1uImPaKnBy+pB6deyD2q~4605237~3162947; bm_mi=ED0CE4DC02EA110DFB507D6368272720~vomkIF/Kp1HiXAh1xmqqr3EKug+SF+A1LTW5I1xGwF72Ny5HIwpcUylimTLK0+/BGH+b6UZoGEOjA82GVwkMMJqcDGBpsR2e0h3hYKyHzdDm0noYQtGP4ARe3ni50clAcXhrIPKXLriq6WPKmRo5unbXTOkUcd/rXzozircoJktrSzPEGVYdI5JA8GYCpZVg3g+NhhuBIwwQgsoTPb2uqO7gSy3sUT56tb4rZ0kdAJv5wWkS6Sqc/8jlhtb7ekCoMTUgCz3om3+aJIMrpzJCqMM1nO0d4JGvdAMgFzalyLqzAf4PIrSHoUxcr9D0d5N0; AMCV_1E0D22CE527845790A490D4D%40AdobeOrg=-1712354808%7CMCIDTS%7C19024%7CMCMID%7C06489206631463030412606870076632113416%7CMCAAMLH-1644226265%7C12%7CMCAAMB-1644226265%7CRKhpRz8krg2tLO6pguXWp5olkAcUniQYPHaMWWgdJ3xzPWQmdj0y%7CMCOPTOUT-1643628666s%7CNONE%7CMCAID%7CNONE%7CvVersion%7C4.3.0; ak_bmsc=8E516B43747B117CA700DC2ADC85B96A~000000000000000000000000000000~YAAQNtjIF13ox65+AQAAkk95rw7IQcMcCtYNFJj59f0eN0zvoqbBXt8yKknH9nvr95P7bzPlUKd12yzXWxQVXM/jLPmVHafQY/l2V+jvJAvVp/R6HHLTIeTwixSpkkPQAsRKOaUdJuyhR4dUgzzC1PAmmn21gZ9xZCP37BHGbpcxslOUStrI/ds2FVPVg9p/nmKqK03Jgp1tgqQC9RRk+Ou3ZX4lwEJvLWyOMku0YRnmuL0KNt2XlycKawrmNLBq750CauFaMWeD+TVq578/pf8U7uJ4Ic70M7f+JfsN+K+FQ3pBCWBT7HKQFkvtPH6ySau7+QKWoX75w4axn5tJcc0C3PbNVm3/HVL5xpXCqvejFeSPKS5/gxWNYlICCy0O3cFFks58W/uoCLPsQ8UsNNDfO3hUkXCnuSruaDpeL8turKFUVjYY0ggPikuoQe5cn4kJg5k7sb+pMuk1yTOhsk3v2gtTsr1d+Pr7vcXslK5t4jip8ddZZqX9tbU2X01vnmMms0KsGS8mDWB4+yltkvdje0RF+PWO3d8=; MMYTUUID=24617072-3124-684d-6d2b-385048702474.1643621488743832; s_pers=%20s_depth%3D1%7C1643627638958%3B%20s_vnum%3D1643653800121%2526vn%253D69%7C1643653800121%3B%20s_lv%3D1643625840223%7C1738233840223%3B%20s_lv_s%3DLess%2520than%25201%2520day%7C1643627640223%3B%20gpv_pn%3Dfunnel%253Adomestic%2520hotels%253Alisting%7C1643627640245%3B%20s_invisit%3Dtrue%7C1643627640250%3B%20s_nr30%3D1643625840261-Repeat%7C1646217840261%3B%20s_nr120%3D1643625840266-Repeat%7C1653993840266%3B%20s_nr7%3D1643625840269-Repeat%7C1644230640269%3B%20s_nr3650%3D1643625840842-Repeat%7C1958985840842%3B; bm_sv=A4EE28A4047D16C6D4A63AC4574729FF~Uobv9zbU3SKbvYGoUHFm8SlNXoE53szrn/NdtQynvT8KpkB5nxPFpBcNNnXkvRwjMerLfyOdeVYW6JvTqdhs12JU/JhdV9CODdjPJu4jAQ4+GY0lYfJcivS2ujA9C+YJFqZz4nzyT9HLbb7ScQOGVq30LpbxDdjxVXT97llidXQ=; _abck=B61CEDE14B2F03A4DDA27FC26C6F9A95~0~YAAQHdjIF0GzTHp+AQAAUA68rweqW6ZMPJ4FY/lYFEJG9/Gwek9FveoIT4nGQQ4gY/w5fbF8ErnU0QdNg4rGL5Lt2Twq6sQneA8OwcCZz66jEr3hrDGX48X368vTudZmF/uX1EiwHu+qVsMLFyaQXPQHAFcoZ3GJSJH2bM36I+NwPIVaQT09lHzPdtSHb+G3PEBj7YL0OE4KptLtnpJd8xKB6M7mm6snLThX9K9kHXyzCjgHyf+ni19gaWkTxKMS8/vxFZY5vj6gU4h2sCb3vmFEUG/2prQByLaOmkmju3nDRJ63TaSwnvFu6zcK+L1iCQpK1FyI4kEABH6KXzbPZjNDV7F2BqkfCv2ZVzkMDzlLjouJKDaFLywXsIa1EwJ9m9s=~-1~-1~-1; visitNumber={"number":60,"time":1643625846619}; s_sess=%20s_cmp_pages%3DSEM%257CD%257CDF%257CG%257CBrand%257CB_M_Makemytrip_Search_Exact%257CBrand_MMT_Exact%257CResponsive%257C544716039990%3B%20cf%3D0%3B%20s_sq%3D%3B%20s_cc%3Dtrue%3B%20tp%3D2732%3B%20s_ppv%3Dfunnel%25253Adomestic%252520hotels%25253Alisting%252C94%252C94%252C2568%3B; MMYTUUID=24617072-3124-7373-5031-3570396e2473.1643625869320836; _abck=B61CEDE14B2F03A4DDA27FC26C6F9A95~-1~YAAQv9xVuLPsa3V+AQAATnO8rwfHNEfLHm70x0ecOyPbspDMiYFm76UBTZgp5kS4fA+Elu0OVO2f21bvj6NmdmVkjx1h2b44wHoQMKAHu5mx0mGg8YP4d12p6i5JDGnnNwnKWKqGBT+e4rdDu6YEcmZ9yfZRs/voLrUXlpGbGn7lx+ElXsE2i4Qy1wkzwkTs72JumpRTTbkSmSFexJQ1h8Sr1DNVx6yGVPrVQ5aDm0trcqhLLTGO7rRQTdqf33kYwqNCYOs36jfBtOGYCCIEHqPDoeuPozUaTEFHLtF6BvFuV1Vi44sejnpti4293rym+Bo5WDKHX8qQ5iakk57FjKq2P3H5BKCfTO08jDzwcVC5sELG+f1Jrb+A1y+fzEJGEKjZILPrdTvmEwhD6w==~0~-1~-1; bm_sv=A4EE28A4047D16C6D4A63AC4574729FF~Uobv9zbU3SKbvYGoUHFm8SlNXoE53szrn/NdtQynvT8KpkB5nxPFpBcNNnXkvRwjMerLfyOdeVYW6JvTqdhs12JU/JhdV9CODdjPJu4jAQ7USoDi3pVoqf2Sw+wMTU+mstKPwvAyAW4f08P672E1TCJ7b4K1M1ePu9cGKZOMsQI='
}
form_data = '{"deviceDetails":{"appVersion":"97.0.4692.99","deviceId":"93d81c50-7cc6-44db-8b88-ddb922b59cd1","bookingDevice":"DESKTOP","networkType":"WiFi","deviceType":"DESKTOP"},"searchCriteria":{"checkIn":"2022-02-23","checkOut":"2022-02-24","limit":10,"roomStayCandidates":[{"adultCount":"2"}],"countryCode":"IN","cityCode":"CTBOM","locationId":"CTBOM","locationType":"city","currency":"INR","lastHotelId":"20131124133838404","lastHotelCategory":"","personalizedSearch":false,"nearBySearch":false,"totalHotelsShown":5},"requestDetails":{"visitorId":"02f76bec-722c-4873-8cc0-e0cbf575a7db","visitNumber":60,"trafficSource":null,"funnelSource":"HOTELS","idContext":"B2C","pageContext":"LISTING","channel":"B2Cweb","couponCount":2,"seoCorp":false,"loggedIn":false},"featureFlags":{"soldOut":true,"staticData":true,"extraAltAccoRequired":false,"freeCancellation":true,"coupon":true,"walletRequired":true,"poisRequiredOnMap":true,"mmtPrime":false,"reviewSummaryRequired":true,"persuasionSeg":"P1000","persuasionsRequired":true,"persuasionsEngineHit":true,"shortlistingRequired":false,"similarHotel":false,"personalizedSearch":false,"originListingMap":false},"imageDetails":{"types":["professional"],"categories":[{"type":"H","count":1,"height":162,"width":243,"imageFormat":"webp"}]},"reviewDetails":{"otas":["MMT","TA"],"tagTypes":["BASE","WHAT_GUESTS_SAY"]},"filterCriteria":[],"matchMakerDetails":{},"sortCriteria":null,"expData":"{APE:10,PAH:5,PAH5:T,WPAH:F,BNPL:T,MRS:T,PDO:PN,MCUR:T,ADDON:T,CHPC:T,AARI:T,NLP:Y,RCPN:T,PLRS:T,MMRVER:V3,BLACK:T,IAO:F,EMIDT:2,ALC:T,HIS:DEFAULT,VIDEO:0,AIP:T,APT:T,FLTRPRCBKT:T,CRF:A}","appliedBatchKeys":[]}'
def start_requests(self):
cookies_raw = 'ccde=IN; dvid=93d81c50-7cc6-44db-8b88-ddb922b59cd1; _gcl_au=1.1.1495556954.1640321393; s_ecid=MCMID%7C06489206631463030412606870076632113416; __gads=ID=0913edca5112d1d9-22f6e64d85cf0031:T=1640321395:RT=1640321395:S=ALNI_Mb3OxDBV_sTh1OnsNOF5fiULoCRiA; mcid=02f76bec-722c-4873-8cc0-e0cbf575a7db; AMCVS_1E0D22CE527845790A490D4D%40AdobeOrg=1; _fbp=fb.1.1641210639410.208887363; lang=eng; ver=pwa_v3; bm_sz=7FCEAFEAE3ACFC31E355EA3647A40A5F~YAAQNtjIFxXnx65+AQAAoi55rw74DINmrCA7C1UtVsw5cbcwDFwsdmEWxQTOYAPWavkAjWi1Qlylr86iiWLuj/xpzPBQEtQKtIBJ+dXeJpEyp/1Cz84POhZgUXoDZhZumAtqeX3QGC4SlxvS3/UW7kLFytCGMWCrLQUw0sh8K14ZV7XD/YUMyVgxGoMBwDWjm8vUK4TTPt3vBsYzL97TmX7MsQRENizX3dHzsHecaHnJTsnWd3b8PKiBAF2PEOTRwYgfYOSa/qRUtrZ/bOR1blnNqGvM85Hz7pletKYErYcLb5iup3DU0McXWymtGsQu1uImPaKnBy+pB6deyD2q~4605237~3162947; bm_mi=ED0CE4DC02EA110DFB507D6368272720~vomkIF/Kp1HiXAh1xmqqr3EKug+SF+A1LTW5I1xGwF72Ny5HIwpcUylimTLK0+/BGH+b6UZoGEOjA82GVwkMMJqcDGBpsR2e0h3hYKyHzdDm0noYQtGP4ARe3ni50clAcXhrIPKXLriq6WPKmRo5unbXTOkUcd/rXzozircoJktrSzPEGVYdI5JA8GYCpZVg3g+NhhuBIwwQgsoTPb2uqO7gSy3sUT56tb4rZ0kdAJv5wWkS6Sqc/8jlhtb7ekCoMTUgCz3om3+aJIMrpzJCqMM1nO0d4JGvdAMgFzalyLqzAf4PIrSHoUxcr9D0d5N0; AMCV_1E0D22CE527845790A490D4D%40AdobeOrg=-1712354808%7CMCIDTS%7C19024%7CMCMID%7C06489206631463030412606870076632113416%7CMCAAMLH-1644226265%7C12%7CMCAAMB-1644226265%7CRKhpRz8krg2tLO6pguXWp5olkAcUniQYPHaMWWgdJ3xzPWQmdj0y%7CMCOPTOUT-1643628666s%7CNONE%7CMCAID%7CNONE%7CvVersion%7C4.3.0; ak_bmsc=8E516B43747B117CA700DC2ADC85B96A~000000000000000000000000000000~YAAQNtjIF13ox65+AQAAkk95rw7IQcMcCtYNFJj59f0eN0zvoqbBXt8yKknH9nvr95P7bzPlUKd12yzXWxQVXM/jLPmVHafQY/l2V+jvJAvVp/R6HHLTIeTwixSpkkPQAsRKOaUdJuyhR4dUgzzC1PAmmn21gZ9xZCP37BHGbpcxslOUStrI/ds2FVPVg9p/nmKqK03Jgp1tgqQC9RRk+Ou3ZX4lwEJvLWyOMku0YRnmuL0KNt2XlycKawrmNLBq750CauFaMWeD+TVq578/pf8U7uJ4Ic70M7f+JfsN+K+FQ3pBCWBT7HKQFkvtPH6ySau7+QKWoX75w4axn5tJcc0C3PbNVm3/HVL5xpXCqvejFeSPKS5/gxWNYlICCy0O3cFFks58W/uoCLPsQ8UsNNDfO3hUkXCnuSruaDpeL8turKFUVjYY0ggPikuoQe5cn4kJg5k7sb+pMuk1yTOhsk3v2gtTsr1d+Pr7vcXslK5t4jip8ddZZqX9tbU2X01vnmMms0KsGS8mDWB4+yltkvdje0RF+PWO3d8=; MMYTUUID=24617072-3124-684d-6d2b-385048702474.1643621488743832; s_pers=%20s_depth%3D1%7C1643627638958%3B%20s_vnum%3D1643653800121%2526vn%253D69%7C1643653800121%3B%20s_lv%3D1643625840223%7C1738233840223%3B%20s_lv_s%3DLess%2520than%25201%2520day%7C1643627640223%3B%20gpv_pn%3Dfunnel%253Adomestic%2520hotels%253Alisting%7C1643627640245%3B%20s_invisit%3Dtrue%7C1643627640250%3B%20s_nr30%3D1643625840261-Repeat%7C1646217840261%3B%20s_nr120%3D1643625840266-Repeat%7C1653993840266%3B%20s_nr7%3D1643625840269-Repeat%7C1644230640269%3B%20s_nr3650%3D1643625840842-Repeat%7C1958985840842%3B; bm_sv=A4EE28A4047D16C6D4A63AC4574729FF~Uobv9zbU3SKbvYGoUHFm8SlNXoE53szrn/NdtQynvT8KpkB5nxPFpBcNNnXkvRwjMerLfyOdeVYW6JvTqdhs12JU/JhdV9CODdjPJu4jAQ4+GY0lYfJcivS2ujA9C+YJFqZz4nzyT9HLbb7ScQOGVq30LpbxDdjxVXT97llidXQ=; _abck=B61CEDE14B2F03A4DDA27FC26C6F9A95~0~YAAQHdjIF0GzTHp+AQAAUA68rweqW6ZMPJ4FY/lYFEJG9/Gwek9FveoIT4nGQQ4gY/w5fbF8ErnU0QdNg4rGL5Lt2Twq6sQneA8OwcCZz66jEr3hrDGX48X368vTudZmF/uX1EiwHu+qVsMLFyaQXPQHAFcoZ3GJSJH2bM36I+NwPIVaQT09lHzPdtSHb+G3PEBj7YL0OE4KptLtnpJd8xKB6M7mm6snLThX9K9kHXyzCjgHyf+ni19gaWkTxKMS8/vxFZY5vj6gU4h2sCb3vmFEUG/2prQByLaOmkmju3nDRJ63TaSwnvFu6zcK+L1iCQpK1FyI4kEABH6KXzbPZjNDV7F2BqkfCv2ZVzkMDzlLjouJKDaFLywXsIa1EwJ9m9s=~-1~-1~-1; visitNumber={"number":60,"time":1643625846619}; s_sess=%20s_cmp_pages%3DSEM%257CD%257CDF%257CG%257CBrand%257CB_M_Makemytrip_Search_Exact%257CBrand_MMT_Exact%257CResponsive%257C544716039990%3B%20cf%3D0%3B%20s_sq%3D%3B%20s_cc%3Dtrue%3B%20tp%3D2732%3B%20s_ppv%3Dfunnel%25253Adomestic%252520hotels%25253Alisting%252C94%252C94%252C2568%3B; MMYTUUID=24617072-3124-7373-5031-3570396e2473.1643625869320836; _abck=B61CEDE14B2F03A4DDA27FC26C6F9A95~-1~YAAQv9xVuLPsa3V+AQAATnO8rwfHNEfLHm70x0ecOyPbspDMiYFm76UBTZgp5kS4fA+Elu0OVO2f21bvj6NmdmVkjx1h2b44wHoQMKAHu5mx0mGg8YP4d12p6i5JDGnnNwnKWKqGBT+e4rdDu6YEcmZ9yfZRs/voLrUXlpGbGn7lx+ElXsE2i4Qy1wkzwkTs72JumpRTTbkSmSFexJQ1h8Sr1DNVx6yGVPrVQ5aDm0trcqhLLTGO7rRQTdqf33kYwqNCYOs36jfBtOGYCCIEHqPDoeuPozUaTEFHLtF6BvFuV1Vi44sejnpti4293rym+Bo5WDKHX8qQ5iakk57FjKq2P3H5BKCfTO08jDzwcVC5sELG+f1Jrb+A1y+fzEJGEKjZILPrdTvmEwhD6w==~0~-1~-1; bm_sv=A4EE28A4047D16C6D4A63AC4574729FF~Uobv9zbU3SKbvYGoUHFm8SlNXoE53szrn/NdtQynvT8KpkB5nxPFpBcNNnXkvRwjMerLfyOdeVYW6JvTqdhs12JU/JhdV9CODdjPJu4jAQ7USoDi3pVoqf2Sw+wMTU+mstKPwvAyAW4f08P672E1TCJ7b4K1M1ePu9cGKZOMsQI='
# cookies dictionary
cookies = {}
# loop over cookies string
for cookie in cookies_raw.split(';'):
# init key value pairs
key = cookie.strip().split('=')[0]
val = cookie.strip().split('=')[-1]
# init cookies
cookies[key] = val
# print(json.dumps(cookies, indent=2))
yield scrapy.Request(
url="https://mapi.makemytrip.com/clientbackend/cg/search-hotels/DESKTOP/2?cityCode=CTBOM&language=eng&region=in&currency=INR&idContext=B2C&countryCode=IN",
method='POST',
headers=self.headers,
cookies=cookies,
body=self.form_data,
callback=self.parse
)

def parse(self, response):
print('nnResponse:', response.text)
print('nnStatus:', response.status)

错误显示为:

2022-02-02 11:03:43 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2022-02-02 11:03:43 [scrapy.core.engine] DEBUG: Crawled (403) <POST https://mapi.makemytrip.com/clientbackend/cg/search-hotels/DESKTOP/2?cityCode=CTBOM&language=eng&region=in&currency=INR&idContext=B2C&countryCode=IN> (referer: https://www.makemytrip.com/)
2022-02-02 11:03:43 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://mapi.makemytrip.com/clientbackend/cg/search-hotels/DESKTOP/2?cityCode=CTBOM&language=eng&region=in&currency=INR&idContext=B2C&countryCode=IN>: HTTP status code is not handled or not allowed
2022-02-02 11:03:43 [scrapy.core.engine] INFO: Closing spider (finished)

当存在被禁止的请求时,会引发HTTP错误代码403。Scrapy自动将USER_AGENT添加为Scrapy/VERSION(+https://scrapy.org)发送到每个请求。尽管这是不可取的,但解决方法是将USER_AGENT设置为模拟浏览器,例如Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0,它将您标识为浏览器。你可以在这里学习如何设置scraby spider设置1。

不过,对于你的问题来说,这似乎可以通过使用硒来实现。您可以创建一个web驱动程序实例来获取数据,并在解析来自selenium的每个响应后创建零碎的项目对象。

最新更新