Python顺序请求.aspx页面



尝试&在请求POST到.aspx页面后呈现页面失败。我已经开发了一个替代的Selenium webdriver解决方案,但想了解为什么使用Requests的帖子失败了。我启动一个GET来收集页面参数VIEWSTATE等,然后选中"导出类型"复选框发布。html显示页面重新加载时带有新闻"Data to Export"复选框,但是当选中其中一个复选框时,呈现默认的基本url页面。如能帮助传播第二个员额申请失败的原因,将不胜感激。请求序列的目的是在预定义的日期之间下载"计划生成不可用"xml。

import requests, csv, time, json, codecs
from datetime import datetime
from datetime import timedelta, date
from io import BytesIO,TextIOWrapper
import pandas as pd
import xml.etree.ElementTree as etree
import matplotlib.pyplot as plt
from bs4 import BeautifulSoup
import subprocess
def print_full(x):
    pd.set_option('display.max_rows', len(x))
    print(x)
    pd.reset_option('display.max_rows')

url ='http://energieinfo.tennet.org/dataexport/exporteerdatacountry.aspx'
headers={
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate',
'Accept-Language':'en-GB,en;q=0.5',
'Content-Type':'application/x-www-form-urlencoded',
'Host':'energieinfo.tennet.org',
'Origin':'http://energieinfo.tennet.org',
'Proxy-Connection':'keep-alive',
'Referer':'http://energieinfo.tennet.org/dataexport/exporteerdatacountry.aspx',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0'}
payload = {}
s = requests.Session()
r = s.get(url=url)
headers['set-cookie'] = r.headers['set-cookie']
print (headers)
#headers['Content-Length'] = r.headers['Content-Length']
soup = BeautifulSoup(r.text)
viewstate_tag = soup.find('input', attrs={"type" : "hidden", "name":"__VIEWSTATE"})
viewstategen_tag = soup.find('input', attrs={"type" : "hidden", "name":"__VIEWSTATEGENERATOR"})
eventvalidation_tag = soup.find('input', attrs={"type" : "hidden", "name":"__EVENTVALIDATION"})
payload[viewstate_tag['name']] = viewstate_tag['value']
payload[viewstategen_tag['name']] = viewstategen_tag['value']
payload[eventvalidation_tag['name']] = eventvalidation_tag['value']
payload['__EVENTTARGET'] =  'ctl00$MainContentPlaceHolder$ExportData$rblSelection$3'
payload['__EVENTARGUMENT'] = ''
payload['__LASTFOCUS'] = ''
payload['ctl00$MainContentPlaceHolder$ExportData$rblSelection']= '3'
payload['ctl00$MainContentPlaceHolder$ExportData$tbDateFrom']=''
payload['ctl00$MainContentPlaceHolder$ExportData$tbDateUntil']=''
data = json.dumps(payload).encode()
#First POST request to load 'Data to Export' checkboxes - this bit works
r = s.post(url=url,data=payload,headers=headers)
#headers['Content-Length'] = r.headers['Content-Length']
with open("requests_results.html", "w") as f:
        f.write(r.text)
payload = {}
soup = BeautifulSoup(r.text)
viewstate_tag = soup.find('input', attrs={"type" : "hidden", "name":"__VIEWSTATE"})
viewstategen_tag = soup.find('input', attrs={"type" : "hidden", "name":"__VIEWSTATEGENERATOR"})
eventvalidation_tag = soup.find('input', attrs={"type" : "hidden", "name":"__EVENTVALIDATION"})
payload[viewstate_tag['name']] = viewstate_tag['value']
payload[viewstategen_tag['name']] = viewstategen_tag['value']
payload[eventvalidation_tag['name']] = eventvalidation_tag['value']
payload['__EVENTTARGET'] ='ctl00$MainContentPlaceHolder$ExportData$cb_VNBProd'
payload['__EVENTARGUMENT'] = ''
payload['__LASTFOCUS'] = ''
payload['ctl00$MainContentPlaceHolder$ExportData$rblSelection']= '3'
payload['ctl00$MainContentPlaceHolder$ExportData$cb_VNBProd']='on'
# payload['ctl00$MainContentPlaceHolder$ExportData$tbDateFrom']='2012/01/01'
# payload['ctl00$MainContentPlaceHolder$ExportData$tbDateUntil']='2018/01/01'
# payload['ctl00$MainContentPlaceHolder$ExportData$btnSubmitDate']='Commit'
# for item , value in payload.items():
#     print(item,value)
data = json.dumps(payload).encode()
#Second POST request to load 'Planned unavailability of generation' checkboxes - this bit only returns the base url page
r = s.post(url=url,data=data,headers=headers)
with open("requests_results2.html", "w") as f:
        f.write(r.text)

您对cookie的使用看起来很可疑。

当使用requests.session时,您不需要从初始响应中复制cookie到以下请求-它们将自动发送。这是会话的功能之一。

在任何情况下,cookie都没有被正确复制。后续请求应该表示服务器在初始响应中给出的cookie,但是您的代码试图通过发送set-cookie标头来设置 cookie。相反,它应该是cookie头,例如

Cookie: ASP.NET_SessionId=cbxwegjj2jq1mwgr2ybdo0jj

但是,如前所述,requests.session将为您处理。

最新更新