使用Python请求进行etrade抓取不想使用跨域URL

试图从etrade中抓取一些基本的股票信息（我知道他们有一个api，但我想先弄清楚这一点），我可以通过以下请求模块登录：

import requests
from bs4 import BeautifulSoup, Comment
symbol = 'A'
payload = {'USER':etradeUsername, 'PASSWORD':etradePassword, 'countrylangselect':'us_english', 'TARGET':'/e/t/pfm/portfolioview'}
with requests.Session() as c:
    c.post('https://us.etrade.com/login.fcc', data=payload)
    r=c.get('https://us.etrade.com/e/t/pfm/portfolioview')
    #r=c.get('https://www.etrade.wallst.com/v1/stocks/snapshot/snapshot.asp?symbol=' + symbol + '&rsO=new')
    etradeMarkup = BeautifulSoup(r.text)
    #print r.headers
    file1 = open("etrade.html","w")
    file1.write("<html><body><head><meta charset='UTF-8'></head>" + str(etradeMarkup.prettify().encode("utf-8")) + "</body></html>")
    file1.flush()
    file1.close()

文件写入是为了让我看看scraper得到了什么。

我可以很好地看到公文包页面，这样我就知道登录是有效的。下一行被评论掉的是我的目标页面。使用浏览器成功登录后，我可以看到www.etrade.wallst.com…页面，但scraper只是被重定向到etrade.com登录页面。

我认为有一个会话转移或cookie变量在域之间移动，我的浏览器知道如何处理，但我的代码不知道。

我的python和http知识已经到了死胡同，希望有人能给我指明正确的方向，让我知道如何编程克服这个困难。

非常感谢你能提供的任何帮助。（python和scratch的新手，所以请友善：）

我发现还有另一个页面需要设置cookie。我以为推送到etrade登录页面是因为需要来自etrade登录后部分的cookie，但我错了。我根本不需要etrade登录这个页面，只需要另一个页面来获取cookie。通过将线添加到视图https://us.etrade.com/e/t/invest/markets?ploc=c-MainNav我能够获得查看目标页面所需的数据，而不会迫使我的程序返回登录页面。

with requests.Session() as c:
    #  adding this line was the key
    c.get('https://us.etrade.com/e/t/invest/markets?ploc=c-MainNav') 
    r=c.get('https://www.etrade.wallst.com/v1/stocks/snapshot/snapshot.asp?symbol=' + symbol + '&rsO=new')
    etradeMarkup = BeautifulSoup(r.text)

相关内容

最新更新

热门标签：