我正在努力收集这个预订网站的结果。该网站会丢弃一个cookie来识别会话。我试着用requests
复制它,但我的响应中仍然有Invalid Session ID error
。我做错了什么?
url = 'https://alilauro-tickets.certusonline.com/php/proxy.php'
s = requests.Session()
s.get(url)
data = {
'msg': 'TimeTable',
'req': '{"getAvailability":"Y","getBasicPrice":"Y","getRouteAnalysis":"Y","directOnly":"Y","legs":1,"pax":1,"origin":"BEV","destination":"FOR","tripRequest":[{"tripfrom":"BEV","tripto":"FOR","tripdate":"2020-03-21","tripleg":0}]}'
}
r = s.post(url, data=data, cookies=s.cookies)
这是我得到的错误:
'sessionID': none, 'errorCode': '620', 'errorDescription': 'Invalid Session Number'
以下是cookie信息:上的Cookie信息
实际上,当您调用https://alilauro-tickets.certusonline.com/php/proxy.php
时,cookie是存在的,但在Javascript函数调用https://alilauro-tickets.certusonline.com/php/proxy.php?msg=Connect
之前,cookie是无效的。正如Dan Dev在评论中提到的那样,这是对CSRF的保护。
使用以下方法即可:
import requests
import json
url = "https://alilauro-tickets.certusonline.com/php/proxy.php"
session = requests.Session()
r = session.post(url, data= { "msg": "Connect"})
r = session.post(url, data= {
"msg": "TimeTable",
"req": json.dumps({
"getAvailability":"Y",
"getBasicPrice":"Y",
"getRouteAnalysis":"Y",
"directOnly":"Y",
"legs":"1",
"pax":1,
"origin":"FOR",
"destination":"BEV",
"tripRequest":[{
"tripfrom":"FOR",
"tripto":"BEV",
"tripdate":"2020-03-20",
"tripleg":0
}]
})
})
print(json.loads(r.text)["VWS_Trips_Trip"])