如何使用Python请求模块登录到web



我一直在阅读关于请求模块的文章,并尝试了一些不同的方法。

但是,当涉及到web身份验证时,会出现一个问题。

Testing site: http://testing-ground.scraping.pro/login
Username: admin
Password: 12345

这是的样本代码

>>> import requests, re
>>> url = 'http://testing-ground.scraping.pro/login'
>>> username = 'admin'
>>> password = '12345'
>>> requests.get(url)
<Response [200]>

无身份验证

>>> print(requests.get(url).text)
<!DOCTYPE html>
<!--[if lt IE 7]>      <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
<!--[if IE 7]>         <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
<!--[if IE 8]>         <html class="no-js lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>Web Scraper Testing Ground</title>
<meta name="description" content="">
<meta name="viewport" content="width=device-width">
<link rel="stylesheet" href="/css/normalize.css">
<link rel="stylesheet" href="/css/main.css">
<script src="/js/vendor/modernizr-2.6.1.min.js"></script>
<script src="/js/vendor/jquery-1.9.1.min.js"></script>
<script src="/js/vendor/jquery-ui-1.10.2.min.js"></script>
<script src="/js/plugins.js"></script>
<script src="/js/main.js"></script>
<link rel="stylesheet" href="/css/QapTcha.jquery.css" />
<script src="/js/QapTcha.jquery.js"></script>

<link rel="stylesheet" href="/fancy-captcha/captcha.css" />
<script src="/fancy-captcha/jquery.captcha.js"></script>
</head>
<body>
<script type="text/javascript">

var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-4436411-8']);
_gaq.push(['_setDomainName', 'extract-web-data.com']);
_gaq.push(['_trackPageview']);

(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();

</script>
<!--[if lt IE 7]>
<p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
<![endif]-->
<div id="topbar"></div>
<a href="/" style="text-decoration: none">
<div id="title">WEB SCRAPER TESTING GROUND</div>
<div id="logo"></div>
</a>
<div id="content">
<h1>LOGIN</h1>
<div id="caseinfo">Often in order to reach the desired information you need to be logged in to the website. Most of today's websites use so-called form-based authentication which implies sending user credentials using POST method, authenticating it on the server and storing user's session in a cookie.</p>
<p>This simple test shows scraper's ability to:</p>
<ol>
<li>Send user credentials via POST method</li>
<li>Receive, Keep and Return a session cookie</li>
<li>Process HTTP redirect (302)</li>
</ol>
<p>How to test:</p>
<ol>
<li>Enter <b>admin</b> and <b>12345</b> in the form below and press <b>Login</b></li>
<li>If you see <span class="success">WELCOME :)</span> then the user credentials were sent, the cookie was passed and HTTP redirect was processed</li>
<li>If you see <span class="error">ACCESS DENIED!</span> then either you entered wrong credentials or they were not sent to the server properly</li>
<li>If you see <span class="error">THE SESSION COOKIE IS MISSING OR HAS A WRONG VALUE!</span> then the user credentials were properly sent but the session cookie was not properly stored or passed</li>
<li>If you see <span class="success">REDIRECTING...</span> then the user credentials were properly sent but HTTP redirection was not processed</li>
<li>Click <b>GO BACK</b> to start again</li>
</ol>
</div>
<hr/>
<div id="case_login">
<h3>Please, login:</h3>
<form action="login?mode=login" method="POST">
<label for="usr">User name:</label>
<input id="usr" name="usr" type="text" placeholder="enter 'admin' here">
<label for="pwd">Password:</label>
<input id="pwd" name="pwd" type="text" placeholder="enter '12345' here">
<input type="submit" value="Login">
</form>
</div>
<br/><br/><br/>
</div>
</body>
</html>
>>> 

带身份验证

>>> print(requests.get(url, auth=(username, password)).text)
<!DOCTYPE html>
<!--[if lt IE 7]>      <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
<!--[if IE 7]>         <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
<!--[if IE 8]>         <html class="no-js lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>Web Scraper Testing Ground</title>
<meta name="description" content="">
<meta name="viewport" content="width=device-width">
<link rel="stylesheet" href="/css/normalize.css">
<link rel="stylesheet" href="/css/main.css">
<script src="/js/vendor/modernizr-2.6.1.min.js"></script>
<script src="/js/vendor/jquery-1.9.1.min.js"></script>
<script src="/js/vendor/jquery-ui-1.10.2.min.js"></script>
<script src="/js/plugins.js"></script>
<script src="/js/main.js"></script>
<link rel="stylesheet" href="/css/QapTcha.jquery.css" />
<script src="/js/QapTcha.jquery.js"></script>

<link rel="stylesheet" href="/fancy-captcha/captcha.css" />
<script src="/fancy-captcha/jquery.captcha.js"></script>
</head>
<body>
<script type="text/javascript">

var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-4436411-8']);
_gaq.push(['_setDomainName', 'extract-web-data.com']);
_gaq.push(['_trackPageview']);

(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();

</script>
<!--[if lt IE 7]>
<p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
<![endif]-->
<div id="topbar"></div>
<a href="/" style="text-decoration: none">
<div id="title">WEB SCRAPER TESTING GROUND</div>
<div id="logo"></div>
</a>
<div id="content">
<h1>LOGIN</h1>
<div id="caseinfo">Often in order to reach the desired information you need to be logged in to the website. Most of today's websites use so-called form-based authentication which implies sending user credentials using POST method, authenticating it on the server and storing user's session in a cookie.</p>
<p>This simple test shows scraper's ability to:</p>
<ol>
<li>Send user credentials via POST method</li>
<li>Receive, Keep and Return a session cookie</li>
<li>Process HTTP redirect (302)</li>
</ol>
<p>How to test:</p>
<ol>
<li>Enter <b>admin</b> and <b>12345</b> in the form below and press <b>Login</b></li>
<li>If you see <span class="success">WELCOME :)</span> then the user credentials were sent, the cookie was passed and HTTP redirect was processed</li>
<li>If you see <span class="error">ACCESS DENIED!</span> then either you entered wrong credentials or they were not sent to the server properly</li>
<li>If you see <span class="error">THE SESSION COOKIE IS MISSING OR HAS A WRONG VALUE!</span> then the user credentials were properly sent but the session cookie was not properly stored or passed</li>
<li>If you see <span class="success">REDIRECTING...</span> then the user credentials were properly sent but HTTP redirection was not processed</li>
<li>Click <b>GO BACK</b> to start again</li>
</ol>
</div>
<hr/>
<div id="case_login">
<h3>Please, login:</h3>
<form action="login?mode=login" method="POST">
<label for="usr">User name:</label>
<input id="usr" name="usr" type="text" placeholder="enter 'admin' here">
<label for="pwd">Password:</label>
<input id="pwd" name="pwd" type="text" placeholder="enter '12345' here">
<input type="submit" value="Login">
</form>
</div>
<br/><br/><br/>
</div>
</body>
</html>
>>> 

由于输出中有一个web登录表单,我认为身份验证没有按预期工作。

<h3>Please, login:</h3>
<form action="login?mode=login" method="POST">
<label for="usr">User name:</label>
<input id="usr" name="usr" type="text" placeholder="enter 'admin' here">
<label for="pwd">Password:</label>
<input id="pwd" name="pwd" type="text" placeholder="enter '12345' here">
<input type="submit" value="Login">
</form>

这种情况下出了什么问题,我该怎么办才能解决?

您应该在登录页面的开头写一篇文章:


>>> import requests, re
>>> url = 'http://testing-ground.scraping.pro/login?mode=login'
>>> username = 'admin'
>>> password = '12345'
>>> requests.post(url, data={'usr':username, 'pwd':password})

最新更新