我一直在尝试使用 python requests
模块对网站进行网络抓取,并且需要登录到该站点才能检索我想要的数据。我到处环顾四周,但找不到为什么它不起作用。这是我到目前为止的代码:
import requests
import bs4 as bs
login_url = "__withheld__"
target_url = "__withheld__"
login_data = { "username": "my_username", "password": "my_password"}
with requests.Session() as s:
page = s.get(login_url)
page_login = s.post(login_url, data = login_data)
page = s.get(target_url)
final_page = bs.BeautifulSoup(page.content, 'lxml')
print(final_page.title)
这是密码框的 html:
<input name="username" type="text" id="username" class="metro-input" placeholder="Username" value="">
<span id="username-error" class=""></span>
<label class="ie789Only"> Password</label>
<input name="password" type="password" id="password" class="metro-input" placeholder="Password">
<input type="submit" name="button1" value="Sign in" id="button1" class="metro-button">
我相信这可能与要求用户单击按钮的网站有关,尽管我找不到解决方案。当我自己登录时,我还尝试在开发人员控制台中查找任何帖子表单,但没有找到概述密码/用户名的明确表单。任何帮助,不胜感激。
更新如果有任何帮助,这是指向由同一公司(隐私)运营的网站的链接,具有相同的安全功能:https://ashwood-vic.compass.education/login.aspx?sessionstate=disabled
你能试试下面的代码一次吗
import requests
import bs4 as bs
username = 'username of the site'
password = 'password of the site'
req = requests.get(login_url, auth=(username, password))
final_page = bs.BeautifulSoup(req.content, 'lxml')
print(final_page.title)
- 请参考此 http://docs.python-requests.org/en/master/user/authentication/#basic-authentication