如何在重定向链接中完成登录表单后下载文件



我想通过python代码从网站下载一些.tgz文件。当我点击文件链接时,它会转到另一个需要我填写表格(用于登录(的页面,填写表格后,它会返回到文件链接并开始下载。我尝试过python3requests,但没有成功:

我的代码:

import requests
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
payload={'username':'salvandi69@gmail.com','password':'123asdzxc'}
myurl="https://eogdata.mines.edu/wwwdata/viirs_products/dnb_composites/v10//201707/vcmslcfg/SVDNB_npp_20170701-20170731_75N060W_vcmslcfg_v10_c201708061200.tgz"
myurl2="https://eogauth.mines.edu/auth/realms/master/protocol/openid-connect/auth?response_type=code&scope=email%20openid&client_id=eogdata_oidc&state=VyIetf3UzkQbxOjX-jJ-ae5lMaM&redirect_uri=https%3A%2F%2Feogdata.mines.edu%2Feog%2Foauth2callback&nonce=DRL2KruY5oxbgo2G6HxNHX-CgiMoxfF6FdGOV-FK65o"
r = requests.post(myurl2, verify=False, data=payload, timeout=6)
print(r.text)

myurl为文件链接,myurl2为重定向链接结果:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" class="login-pf">
<head>
<meta charset="utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="robots" content="noindex, nofollow">
<meta name="viewport" content="width=device-width,initial-scale=1"/>
<title>Log in to Earth Observation Group Login</title>
<link rel="icon" href="/auth/resources/afx5f/login/eog/img/favicon.ico" />
<link href="/auth/resources/afx5f/common/keycloak/node_modules/patternfly/dist/css/patternfly.min.css" rel="stylesheet" />
<link href="/auth/resources/afx5f/common/keycloak/node_modules/patternfly/dist/css/patternfly-additions.min.css" rel="stylesheet" />
<link href="/auth/resources/afx5f/common/keycloak/lib/zocial/zocial.css" rel="stylesheet" />
<link href="/auth/resources/afx5f/login/eog/css/login.css" rel="stylesheet" />
</head>
<body class="">
<div class="login-pf-page">
<div id="kc-header" class="login-pf-page-header">
<div id="kc-header-wrapper" class=""><div class="kc-logo-text"><span>EOG</span></div></div>
</div>
<div class="card-pf ">
<header class="login-pf-header">
<div id="kc-locale">
<div id="kc-locale-wrapper" class="">
<div class="kc-dropdown" id="kc-locale-dropdown">
<a href="#" id="kc-current-locale-link">English</a>
<ul>
<li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=de">Deutsch</a></li>
<li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=no">Norsk</a></li>
<li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=ru">Русский</a></li>
<li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=sv">Svenska</a></li>
<li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=pt-BR">Português (Brasil)</a></li>
<li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=lt">Lietuvių</a></li>
<li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=en">English</a></li>
<li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=it">Italiano</a></li>
<li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=fr">Français</a></li>
<li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=zh-CN">中文简体</a></li>
<li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=es">Español</a></li>
<li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=cs">Čeština</a></li>
<li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=ja">日本語</a></li>
<li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=sk">Slovenčina</a></li>
<li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=pl">Polish</a></li>
<li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=ca">Català</a></li>
<li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=nl">Nederlands</a></li>
<li class="kc-dropdown-item"><a href="/auth/realms/master/protocol/openid-connect/auth?kc_locale=tr">tr</a></li>
</ul>
</div>
</div>
</div>
<h1 id="kc-page-title">        We are sorry...
</h1>
</header>
<div id="kc-content">
<div id="kc-content-wrapper">

<div id="kc-error-message">
<p class="instruction">Invalid Request</p>
</div>

</div>
</div>
</div>
</div>
</body>
</html>

主要问题是:您使用登录页面将POST发送到url,但form不必这样做。您应该检查<form action=...>以获得POST的正确url。

我使用BeautifulSoupHTML获得这些信息。

我没有usernamepassword来测试所有元素,但至少现在POST获得了带有登录表单和消息Invalid username or password.的页面,而不是带有Invalid Request的页面

import requests
from bs4 import BeautifulSoup as BS
s = requests.Session()
#s.headers.update({'User-Agent': 'Mozilla/5.0'})
# --- use tgz to get login page -------
url_tgz = "https://eogdata.mines.edu/wwwdata/viirs_products/dnb_composites/v10//201707/vcmslcfg/SVDNB_npp_20170701-20170731_75N060W_vcmslcfg_v10_c201708061200.tgz"
r = s.get(url_tgz)
#print(r.status_code)
#print(r.history)
print('n--- url page ---n')
print(r.url)
# --- find url in form ---
soup = BS(r.text, 'html.parser')
item = soup.find('form') 
url = item['action']
print('n--- url form ---n')
print(url)
print('n--- url page == url in form ---n')
print( r.url == url )
# --- login ---
payload = {
'username': 'salvandi69@gmail.com',
'password': '123asdzxc',
'credentialId': '',
}
r = s.post(url, data=payload)
#print(r.status_code)
#print(r.history)
#print(r.url)
#print(r.text)
# --- result ---
print('n--- login ---n')
soup = BS(r.text, 'html.parser')
item = soup.find('span', {'class': 'kc-feedback-text'})
if item:
print('Message:', item.text)
else:
print("Can't see error message")
print('n--- end ---n')

最新更新