可以使用python的响应模块填写和提交此表单吗?



我正在编写一个数据抓取脚本。其目的是从BT的网站上收集有关可用宽带交易的数据。我无法弄清楚为什么我的简单请求代码没有填写表单并继续进入下一页。

请帮助我弄清楚如何在此表单中输入数据并保存输出 html 以进行数据抓取。

我已经在我感兴趣的表格中确定了相关标签。我正在尝试填充 UPRN 字段并继续下一页

网站链接: https://www.dslchecker.bt.com/#

我的蟒蛇代码:'''蟒蛇

import requests
url = "https://www.dslchecker.bt.com/#"
payload = {'UPRN':'10033360983'}
r = requests.post(url, data = payload)
print(r.text)
'

''

来自网站的表格:

''

'html

<form method="post" action="adsl/ADSLChecker.UPRNoutput"><input type="hidden" name="URL"> <input value="a%20service%20provider" type="hidden" name="SP_NAME">
      <span class="subheading">UPRN:</span><br><input maxlength="13" size="14" name="UPRN" autocomplete="off" style="background-image: url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAASCAYAAABSO15qAAAAAXNSR0IArs4c6QAAAPhJREFUOBHlU70KgzAQPlMhEvoQTg6OPoOjT+JWOnRqkUKHgqWP4OQbOPokTk6OTkVULNSLVc62oJmbIdzd95NcuGjX2/3YVI/Ts+t0WLE2ut5xsQ0O+90F6UxFjAI8qNcEGONia08e6MNONYwCS7EQAizLmtGUDEzTBNd1fxsYhjEBnHPQNG3KKTYV34F8ec/zwHEciOMYyrIE3/ehKAqIoggo9inGXKmFXwbyBkmSQJqmUNe15IRhCG3byphitm1/eUzDM4qR0TTNjEixGdAnSi3keS5vSk2UDKqqgizLqB4YzvassiKhGtZ/jDMtLOnHz7TE+yf8BaDZXA509yeBAAAAAElFTkSuQmCC&quot;); background-repeat: no-repeat; background-attachment: scroll; background-size: 16px 18px; background-position: 98% 50%; cursor: auto;"> <input value="56" type="hidden" name="VERSION"> <input value="E" type="hidden" name="MS"> <input value="no" type="hidden" name="CAP"> <input value="Y" type="hidden" name="AEA"> &nbsp; <input class="form_button" value="submit" type="submit"> </form>
'

''

请点击此链接:https://www.dslchecker.bt.com/#并在UPRN字段中输入10033346575以查看所需的输出

在 jupyter 笔记本中运行时我的输出:

''

'html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0063)http://dslcheckerait.vade.bt.com:61065/adsl/adslchecker.welcome -->
<HTML><HEAD>
<STYLE>
.body {FONT-VARIANT: normal; FONT-FAMILY: Verdana, Arial, Helvetica, sans-serif; COLOR: #004d5f; FONT-SIZE: 11px; FONT-WEIGHT: normal; TEXT-DECORATION: none
}
.bodybold {FONT-VARIANT: normal; FONT-FAMILY: Verdana, Arial, Helvetica, sans-serif; COLOR: #333333; FONT-SIZE: 11px; FONT-WEIGHT: bold; TEXT-DECORATION: none
}
.errormessage {FONT-VARIANT: normal; FONT-FAMILY: Verdana, Arial, Helvetica, sans-serif; COLOR: #000000; FONT-SIZE: 11px; FONT-WEIGHT: bold; TEXT-DECORATION: none
}
.formDescription {FONT-VARIANT: normal; FONT-FAMILY: Verdana, Arial, Helvetica, sans-serif; COLOR: #666666; FONT-SIZE: 9px; FONT-WEIGHT: normal; TEXT-DECORATION: none
}
.form_button {BORDER-BOTTOM: #666666 1px solid; BORDER-LEFT: #666666 1px solid; BACKGROUND-COLOR: #6400AA; FONT-VARIANT: normal; FONT-FAMILY: Calibri Light, Arial, Helvetica, sans-serif; COLOR: #ffffff; FONT-SIZE: 10px; BORDER-TOP: #666666 1px solid; FONT-WEIGHT: bold; BORDER-RIGHT: #666666 1px solid; TEXT-DECORATION: none
}
.heading {FONT-VARIANT: normal; FONT-FAMILY: Arial, Helvetica, sans-serif; COLOR: #004d5f; FONT-SIZE: 14px; FONT-WEIGHT: bold; TEXT-DECORATION: none
}
.heading3 {FONT-VARIANT: normal; FONT-FAMILY: Verdana, Arial, Helvetica, sans-serif; COLOR: #333333; FONT-SIZE: 10px; FONT-WEIGHT: bold; TEXT-DECORATION: none
}
.heading4 {FONT-VARIANT: normal; FONT-FAMILY: Verdana, Arial, Helvetica, sans-serif; COLOR: #91b1b8; FONT-SIZE: 10px; FONT-WEIGHT: bold; TEXT-DECORATION: none
}
.subheading {FONT-VARIANT: normal; FONT-FAMILY: Calibri Light, Helvetica, sans-serif; COLOR: color: #333333; FONT-SIZE: 14px; FONT-WEIGHT: bold; TEXT-DECORATION: none
}
A:active {FONT-VARIANT: normal; FONT-FAMILY: Calibri Light, Arial, Helvetica, sans-serif; COLOR: #6400AA; FONT-SIZE: 12px; FONT-WEIGHT: bold; TEXT-DECORATION: underline
}
A:hover {FONT-VARIANT: normal; FONT-FAMILY: Calibri Light, Arial, Helvetica, sans-serif; COLOR: #6400AA; FONT-SIZE: 12px; FONT-WEIGHT: bold; TEXT-DECORATION: underline
}
A:link {FONT-VARIANT: normal; FONT-FAMILY: Calibri Light, Arial, Helvetica, sans-serif; COLOR: #6400AA; FONT-SIZE: 12px; FONT-WEIGHT: bold; TEXT-DECORATION: none
}
A:visited {FONT-VARIANT: normal; FONT-FAMILY: Calibri Light, Arial, Helvetica, sans-serif; COLOR: #6400AA; FONT-SIZE: 12px; FONT-WEIGHT: bold; TEXT-DECORATION: underline
}
BODY {PADDING-BOTTOM: 0px; BACKGROUND-COLOR: #ffffff; MARGIN: 10px; PADDING-LEFT: 0px; PADDING-RIGHT: 0px; PADDING-TOP: 0px
}
</STYLE>
<TITLE>BT Broadband</TITLE>
<META content="text/html; charset=utf-8" http-equiv=Content-Type><LINK
rel=stylesheet type=text/css
href="adslchecker_font.html">
<META content=text/css http-equiv=Content-Style-Type><META http-equiv="X-UA-Compatible" content="IE=5">
<SCRIPT>
<!--
function setFocus() {
    document.forms[0].elements[2].focus();
}
//-->
</SCRIPT>
<META name=GENERATOR content="MSHTML 8.00.7601.18751"></HEAD>
<BODY onload=setFocus()>
<TABLE width=500 align=center>
  <TBODY>
  <TR>
    <TD>
      <SCRIPT language=JavaScript>  var isNS = (navigator.appName == "Netscape") ? 1 : 0;var EnableRightClick = 0;if(isNS) document.captureEvents(Event.MOUSEDOWN||Event.MOUSEUP);function mischandler(){if(EnableRightClick==1){ return true;}else {return false; }}function mousehandler(e){  if(EnableRightClick==1){ return true; }  var myevent = (isNS) ? e : event;  var eventbutton = (isNS) ? myevent.which : myevent.button;  if((eventbutton==2)||(eventbutton==3)) return false;}function keyhandler(e) {var myevent = (isNS) ? e : window.event;if (myevent.keyCode==96)EnableRightClick = 1;return;}document.oncontextmenu = mischandler;document.onkeypress = keyhandler;document.onmousedown = mousehandlerdocument.onmouseup = mousehandler;</SCRIPT>
      <TABLE border=0 cellSpacing=0 cellPadding=0 width="100%"><!-- Start Header -->
        <TBODY>
        <TR><BR><BR>
          <!--<TD height=20 vAlign=top align=left><IMG border=0 alt="BT Wholesale"
            src="dsl_images/g_main_logo.gif" width=129
height=20></TD></TR>
        <TR>
          <TD class=body height=14 vAlign=top align=left><IMG alt=""
            src="dsl_images/spacer.gif" width=450 height=14></TD></TR>
        <TR>//-->
          <TD class=body vAlign=top align=left fontStyle="italic">
            <TABLE border=0 cellSpacing=0 cellPadding=0 width=450><!-- Start Page Title -->
              <TBODY>
              <TR>
                <TD height=45 vAlign=top width=600 align=left><FONT
                  style="FONT-FAMILY: Calibri Light" color=#6400AA size=6.5><B> BT BROADBAND
                  AVAILABILITY
              CHECKER</B></FONT></TD></TR><!-- End Page Title --></TD></TR></TBODY></TABLE></TD></TR></TBODY></TABLE><SPAN
      class=body><!--RESPONSE-START-->
      <P><SPAN class=body><font size="2" font face="Calibri Light" color="#333333">Welcome to the Broadband Availability checker. This
      will provide a provisional check of your ability to receive reliable
      Broadband services.</font></SPAN></P>
      <P><SPAN class=body><font size="2" font face="Calibri Light" color="#333333">Please enter your telephone number.</font></SPAN></P>
      <FORM method=post action=adsl/adslchecker.TelephoneNumberOutput><INPUT
      type=hidden name=URL> <INPUT value=a%20service%20provider type=hidden
      name=SP_NAME> <SPAN class=subheading>TELEPHONE:</SPAN><BR><INPUT
      maxLength=14 size=14 name=TelNo> <INPUT value=56 type=hidden name=VERSION>
      <INPUT value=E type=hidden name=MS> <INPUT value=no type=hidden name=CAP>
      <INPUT value=Y type=hidden name=AEA> &nbsp; <INPUT class=form_button value=submit type=submit> </FORM>
      <P><SPAN class=body>Or</SPAN></P>
      <P><SPAN class=body><font size="2" font face="Calibri Light" color="#333333">Please enter your access line id.</font></SPAN></P>
      <FORM method=post action=adsl/adslchecker.AccessLineIDOutput><INPUT type=hidden
      name=URL> <INPUT value=a%20service%20provider type=hidden name=SP_NAME>
      <SPAN class=subheading>ACCESS LINE ID:</SPAN><BR><INPUT maxLength=13
      size=14 name=ALID> <INPUT value=56 type=hidden name=VERSION> <INPUT
      value=E type=hidden name=MS> <INPUT value=no type=hidden name=CAP> <INPUT
      value=Y type=hidden name=AEA> &nbsp; <INPUT class=form_button value=submit type=submit> </FORM>
          <P><SPAN class=body>Or</SPAN></P>
      <P><SPAN class=body><font size="2" font face="Calibri Light" color="#333333">Please enter your UPRN.</font></SPAN></P>
      <FORM method=post action=adsl/ADSLChecker.UPRNoutput><INPUT type=hidden
      name=URL> <INPUT value=a%20service%20provider type=hidden name=SP_NAME>
      <SPAN class=subheading>UPRN:</SPAN><BR><INPUT maxLength=13
      size=14 name=UPRN> <INPUT value=56 type=hidden name=VERSION> <INPUT
      value=E type=hidden name=MS> <INPUT value=no type=hidden name=CAP> <INPUT
      value=Y type=hidden name=AEA> &nbsp; <INPUT class=form_button value=submit type=submit> </FORM>
      <P><SPAN class=body><font size="2" font face="Calibri Light" color="#333333">If you do not have a telephone number or access line
      id, please select the</font>
<TABLE>
  <TR>
   <FORM method=post action=adsl/adslchecker.address>
          <INPUT value="" type=hidden name=url>
          <INPUT value=a%20service%20provider type=hidden name=SP_NAME>
          <INPUT value=56 type=hidden name=VERSION>
          <INPUT value=E type=hidden name=MS>
          <INPUT value=no type=hidden name=CAP>
          <INPUT value=Y type=hidden name=AEA>
          <TD><A href=# onclick="document.forms[3].submit()">Address Checker</A></TD>
   </FORM>
          <FONT>
          <TH><P><SPAN class=body><font size="2" font face="Calibri Light" color="#333333">or the</font></SPAN></P></TH>
          </FONT>
   <FORM method=post action=adsl/adslchecker.postcode>
          <TD><A href=# onclick="document.forms[4].submit()">Postcode Checker</A></TD>
          <INPUT value="" type=hidden name=url>
          <INPUT value=a%20service%20provider type=hidden name=SP_NAME>
          <INPUT value=56 type=hidden name=VERSION>
          <INPUT value=E type=hidden name=MS>
          <INPUT value=no type=hidden name=CAP>
          <INPUT value=Y type=hidden name=AEA>
   </FORM>
          <FONT>
          <TH><P><SPAN class=body><font size="2" font face="Calibri Light" color="#333333">or the</font></SPAN></P></TH>
          </FONT>
   <FORM method=post action=adsl/adslchecker.bbeuidform>
          <TD><A href=# onclick="document.forms[5].submit()">BBEU Checker</A></TD>
          <INPUT value="" type=hidden name=url>
          <INPUT value=a%20service%20provider type=hidden name=SP_NAME>
          <INPUT value=56 type=hidden name=VERSION>
          <INPUT value=E type=hidden name=MS>
          <INPUT value=no type=hidden name=CAP>
          <INPUT value=Y type=hidden name=AEA>
   </FORM>
  </TR>
</TABLE>
<P><SPAN class=body><font size="2" font face="Calibri Light" color="#333333">By submitting a query into this checker you accept <A
      href="https://www.btwholesale.com/pages/static/terms-of-use.htm" target="_blank">Terms of Use</A> for this checker.</font>
<!--RESPONSE-END--></SPAN></P></SPAN></TD></TR></TBODY></TABLE></BODY></HTML>
'

''

所以 1( 你发帖到错误的网址。

从返回的 HTML 中,所需表单的"操作"是"adsl/ADSLChecker.UPRNoutput">

2(您未提交的表单中有隐藏字段

<form method="post" action="adsl/ADSLChecker.UPRNoutput">
    <input type="hidden" name="URL"> 
    <input value="a%20service%20provider" type="hidden" name="SP_NAME">
    <span class="subheading">UPRN:</span><br>
    <input maxlength="13" size="14" name="UPRN"> 
    <input value="56" type="hidden" name="VERSION"> 
    <input value="E" type="hidden" name="MS"> 
    <input value="no" type="hidden" name="CAP"> 
    <input value="Y" type="hidden" name="AEA"> &nbsp; 
    <input class="form_button" value="submit" type="submit"> 
</form>

尝试:

payload = { 
    "UPRN": "10033360983", 
    "SP_NAME": "a%20service%20provider", 
    "VERSION": "56", 
    "MS": "E", 
    "CAP": "no", 
    "AEA": "Y" 
}   
url = 'https://www.dslchecker.bt.com/adsl/ADSLChecker.UPRNoutput'
r = requests.post(url, data = payload)

您发布了错误的网址。我用熊猫拉桌子,所以你需要做一些清理工作,但请尝试:

import requests
import pandas as pd
url = 'https://www.dslchecker.bt.com/adsl/ADSLChecker.UPRNoutput'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}
UPRN = 10033346575

payload = {
'URL': '',
'SP_NAME': 'a%20service%20provider',
'UPRN': str(UPRN),
'VERSION': '56',
'MS': 'E',
'CAP': 'no',
'AEA': 'Y'}
response = requests.post(url, headers=headers, params=payload)
tables = pd.read_html(response.text)
df = tables[-1]

输出:

print(df.to_string())
          Featured Products  Downstream Line Rate(Mbps)                              Upstream Line Rate(Mbps)                           Downstream Handback Threshold(Mbps)  WBC FTTC Availability Date WBC SOGEA Availability Date Unnamed: 8_level_0
         Unnamed: 0_level_1                        High                         Low                      High                       Low                  Unnamed: 5_level_1          Unnamed: 6_level_1          Unnamed: 7_level_1 Unnamed: 8_level_1
0      VDSL Range A (Clean)                           3                         1.2                       1.2                       0.8                                 0.8                   Available                   Available                NaN
1   VDSL Range B (Impacted)                         2.8                         1.2                       1.2                       0.5                                 0.8                   Available                   Available                NaN
2         Featured Products  Downstream Line Rate(Mbps)  Downstream Line Rate(Mbps)  Upstream Line Rate(Mbps)  Upstream Line Rate(Mbps)              Downstream Range(Mbps)  WBC FTTP Availability Date                         NaN                NaN
3            FTTP on Demand                         330                         330                        30                        30                                  --                   Available                          --                NaN
4             ADSL Products  Downstream Line Rate(Mbps)  Downstream Line Rate(Mbps)  Upstream Line Rate(Mbps)  Upstream Line Rate(Mbps)              Downstream Range(Mbps)           Availability Date                         NaN                NaN
5               WBC ADSL 2+                     Up to 1                     Up to 1                        --                        --                            1 to 3.5                   Available                          --                NaN
6                  ADSL Max                     Up to 1                     Up to 1                        --                        --                         0.75 to 2.5                   Available                          --                NaN
7            WBC Fixed Rate                         0.5                         0.5                        --                        --                                  --                   Available                          --                NaN
8                Fixed Rate                         0.5                         0.5                        --                        --                                  --                   Available                          --                NaN
9           Observed Speeds                        VDSL                        VDSL                       NaN                       NaN                                 NaN                         NaN                         NaN                NaN
10          Other Offerings                         NaN                         NaN                       NaN                       NaN                                 NaN           Availability Date                         NaN                NaN
11           VDSL Multicast                          --                          --                        --                        --                                  --                   Available                          --                NaN
12           ADSL Multicast                          --                          --                        --                        --                                  --                   Available                          --                NaN

最新更新