使用 python 从 ajax 请求中抓取 XML 响应



我正在尝试获取在点击最大(时间范围(按钮时加载到此页面图表中的数据。数据随 ajax 请求一起加载。

我检查了请求并尝试使用请求 python 库重现它,但我只能从此图表中检索 1 年数据。

这是我使用的代码:

r = requests.get("https://www.justetf.com/en/etf-profile.html?0-4.0-tabs-panel-chart-dates-ptl_max&groupField=none&sortField=ter&sortOrder=asc&from=search&isin=IE00B3VWN518&tab=chart&_=1576272593482")
r.content

我还尝试使用会话:

from requests import Session
session = Session()
session.head('http://justetf.com')
response = session.get(
url='https://www.justetf.com/en/etf-profile.html?0-4.0-tabs-panel-chart-dates-ptl_max&groupField=none&sortField=ter&sortOrder=asc&from=search&isin=IE00B3VWN518&tab=chart&_=1575929227619',
data = {"0-4.0-tabs-panel-chart-dates-ptl_max":"",
"groupField":"none","sortField":"ter",
"sortOrder":"asc","from":"search",
"isin":"IE00B3VWN518",
"tab":"chart",
"_":"1575929227619"
},
headers={
'Host': 'www.justetf.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
'Accept': 'application/xml, text/xml, */*; q=0.01',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Wicket-Ajax': 'true',
'Wicket-Ajax-BaseURL': 'en/etf-profile.html?0&groupField=none&sortField=ter&sortOrder=asc&from=search&isin=IE00B3VWN518&tab=chart',
'Wicket-FocusedElementId': 'id28',
'X-Requested-With': 'XMLHttpRequest',
'Connection': 'keep-alive',
'Referer': 'https://www.justetf.com/en/etf-profile.html?groupField=none&sortField=ter&sortOrder=asc&from=search&isin=IE00B3VWN518&tab=chart',
'Cookie': 'locale_=en; _ga=GA1.2.1297456970.1574289342; cookieconsent_status=dismiss; AWSALB=QMWHJxgfcpLXJLqX0i0FgBuLn+mpVHVeLRQ6upH338LdggA4/thXHT2vVWQX7pdBd1r486usZXgpAF8RpDsGJNtf6ei8e5NHTsg0hzVHR9C+Fj89AWuQ7ue+fzV2; JSESSIONID=ABB2A35B91751CA9B2D293F5A04505BE; _gid=GA1.2.1029531470.1575928527; _gat=1',
'TE': 'Trailer'

},
cookies = {"_ga":"GA1.2.1297456970.1574289342","_gid":"GA1.2.1411779365.1574289342","AWSALB":"5v+tPMgooQC0deJBlEGl2wVeUSmwVGJdydie1D6dAZSRAK5eBsmg+DQCdBj8t25YRytC5NIi0TbU3PmDcNMjiyFPTp1xKHgwNjZcDvMRePZjTxthds5DsvelzE2I","JSESSIONID":"310F346AED94D1A345207A3489DCF83D","locale_":"en"}
)

但我得到这个回应

<ajax-response><redirect><![CDATA[/en/etf-profile.html?0&groupField=none&sortField=ter&sortOrder=asc&from=search&isin=IE00B3VWN518&tab=chart]]></redirect></ajax-response>

为什么我在按 MAX 时没有收到对浏览器上相同的 XML 文件的响应?

好的,以下是我获取所需数据的解决方案:

url = "https://www.justetf.com/en/etf-profile.html"
querystring = {
# Modify this string to get the timeline you want
# Currently it is set to "max" as you can see
"0-1.0-tabs-panel-chart-dates-ptl_max":"",
"groupField":"none",
"sortField":"ter",
"sortOrder":"asc",
"from":"search",
"isin":"IE00B3VWN518",
"tab":"chart",
"_":"1576627890798"}
# Not all of these headers may be necessary
headers = {
'authority': "www.justetf.com",
'accept': "application/xml, text/xml, */*; q=0.01",
'x-requested-with': "XMLHttpRequest",
'wicket-ajax-baseurl': "en/etf-profile.html?0&amp;groupField=none&amp;sortField=ter&amp;sortOrder=asc&amp;from=search&amp;isin=IE00B3VWN518&amp;tab=chart",
'wicket-ajax': "true",
'wicket-focusedelementid': "id27",
'Connection': "keep-alive",
}
session = requests.Session()
# The first request won't return what we want but it sets the cookies
response = session.get( url, params=querystring)
# Cookies have been set now we can make the 2nd request and get the data we want
response = session.get( url, headers=headers, params=querystring)
print(response.text)

作为奖励,我包含一个指向 repl.it 的链接,我实际上在其中解析数据并获取每个单独的数据点。你可以在这里找到这个。

让我知道这是否有帮助!

最新更新