需要带有请求的Cookie的网页抓取页面



我正在努力收集这个预订网站的结果。该网站会丢弃一个cookie来识别会话。我试着用requests复制它,但我的响应中仍然有Invalid Session ID error。我做错了什么?

url = 'https://alilauro-tickets.certusonline.com/php/proxy.php'
s = requests.Session()
s.get(url)
data = {
'msg': 'TimeTable',
'req': '{"getAvailability":"Y","getBasicPrice":"Y","getRouteAnalysis":"Y","directOnly":"Y","legs":1,"pax":1,"origin":"BEV","destination":"FOR","tripRequest":[{"tripfrom":"BEV","tripto":"FOR","tripdate":"2020-03-21","tripleg":0}]}'
}
r = s.post(url, data=data, cookies=s.cookies)

这是我得到的错误:

'sessionID': none, 'errorCode': '620', 'errorDescription': 'Invalid Session Number'

以下是cookie信息:上的Cookie信息

实际上,当您调用https://alilauro-tickets.certusonline.com/php/proxy.php时,cookie是存在的,但在Javascript函数调用https://alilauro-tickets.certusonline.com/php/proxy.php?msg=Connect之前,cookie是无效的。正如Dan Dev在评论中提到的那样,这是对CSRF的保护。

使用以下方法即可:

import requests
import json
url = "https://alilauro-tickets.certusonline.com/php/proxy.php"
session = requests.Session()
r = session.post(url, data= { "msg": "Connect"})
r = session.post(url, data= { 
"msg": "TimeTable", 
"req": json.dumps({
"getAvailability":"Y",
"getBasicPrice":"Y",
"getRouteAnalysis":"Y",
"directOnly":"Y",
"legs":"1",
"pax":1,
"origin":"FOR",
"destination":"BEV",
"tripRequest":[{
"tripfrom":"FOR",
"tripto":"BEV",
"tripdate":"2020-03-20",
"tripleg":0
}]
})
})
print(json.loads(r.text)["VWS_Trips_Trip"])

最新更新