蟒蛇机械汤 - 网页抓取 - 无法以.aspx形式打开分页链接,即使修改了"__EVENTTARGET"和"__EVENTARGUMENT"



有人可以帮助我提供有关如何使用 mechanicalsoup 以 aspx 形式打开分页链接的说明吗,我更新了__EVENTTARGET和__EVENTARGUMENT,但它仍然打开当前页面,而不是打开下一页。

form = browser.select_form('#form1')
form["__EVENTTARGET"] = "ctl00$ContentPlaceHolder1$gvData"
form["__EVENTARGUMENT"] = "Page$2"
print(form.form.find("input", {"name": "__EVENTTARGET"}).attrs)
print(form.form.find("input", {"name": "__EVENTARGUMENT"}).attrs)
new_response = browser.submit_selected()
print(new_response.content)

此脚本从第 1 页转到第 9 页并获取以下信息:

import requests
from bs4 import BeautifulSoup

url = 'https://www.bseindia.com/corporates/List_Scrips.aspx'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
data = {}
for i in soup.select('input'):
data[i['name']] = i.get('value', '')
for page in range(1, 10):  # <--- increase the number of pages here
print('Page {}...'.format(page))
print('-' * 80)
soup = BeautifulSoup(requests.post(url, headers=headers, data=data).content, 'html.parser')
for tr in soup.select('tr.TTHeader ~ tr:not(:has(td[colspan]))'):
print(tr.get_text(strip=True, separator=' '))
data = {}
for i in soup.select('input'):
data[i['name']] = i.get('value', '')
data['__EVENTTARGET'] = 'ctl00$ContentPlaceHolder1$gvData'
data['__EVENTARGUMENT']  = 'Page${}'.format(page+1)
del data['ctl00$ContentPlaceHolder1$btnSubmit']

指纹:

Page 1...
--------------------------------------------------------------------------------
500002 ABB ABB India Limited Active B 2.00 INE117A01022 Heavy Electrical Equipment Equity
500003 AEGISLOG AEGIS LOGISTICS LTD. Active A 1.00 INE208C01025 Oil Marketing & Distribution Equity
500004 TPAEC TORRENT POWER AEC LTD. Delisted B 10.00 INE424A01014 Equity
500005 AKARLAMIN AKAR LAMINATORS LTD. Delisted XD 10.00 INE984C01013 Iron & Steel Products Equity
500006 ALPHADR ALPHA DRUG INDIA LTD. Delisted B 10.00 INE256B01026 Equity
500008 AMARAJABAT AMARA RAJA BATTERIES LTD. Active A 1.00 INE885A01032 Auto Parts & Equipment Equity
500009 AMBALALSA AMBALAL SARABHAI ENTERPRISES LTD. Active X 10.00 INE432A01017 Pharmaceuticals Equity
500010 HDFC HOUSING DEVELOPMENT FINANCE CORP.LTD. Active A 2.00 INE001A01036 Housing Finance Equity
500011 AMRTMIL-BDM AMRUT INDUSTRIES LTD. Delisted Z 10.00 NA Equity
500012 ANDHRAPET ANDHRA PETROCHEMICALS LTD. Active X 10.00 INE714B01016 Commodity Chemicals Equity
500013 ANSALAPI ANSAL PROPERTIES & INFRASTRUCTURE LTD. Active T 5.00 INE436A01026 Realty Equity
500014 UTIQUE Utique Enterprises Ltd Active X 10.00 INE096A01010 Finance (including NBFCs) Equity
500015 ICICIDM ICICI LTD. Delisted B 10.00 INE005A01011 Equity
500016 ARUNAHTEL ARUNA HOTELS LTD. Active XT 10.00 INE957C01019 Hotels Equity
500018 ARPOLDM ARPOLDM Delisted B 10.00 INE035A01018 Equity
500019 BOR BANK OF RAJASTHAN LTD. Delisted B 10.00 INE320A01014 Banks Equity
500020 BOMDYEING BOMBAY DYEING & MFG.CO.LTD. Active A 2.00 INE032A01023 Textiles Equity
500021 ASINCOF ASINCOF Delisted Z 10.00 NA Equity
500023 ASIANHOTNR Asian Hotels (North) Limited Active B 10.00 INE363A01022 Hotels Equity
500024 ASSAMCO Assam Company (India) Limited Delisted T 1.00 INE442A01024 Tea & Coffee Equity
500025 ASSAMBR ASSAMBROOK LTD.-$ Delisted X 10.00 INE353C01011 Tea & Coffee Equity
500026 ATSHIND ATASH INDUSTRIES LTD. Delisted Z 10.00 NA Equity
500027 ATUL ATUL LTD. Active A 10.00 INE100A01010 Specialty Chemicals Equity
500028 ATVPR ATV PROJECTS INDIA LTD. Active XT 10.00 INE447A01015 Construction & Engineering Equity
500029 AUTOLITIND AUTOLITE (INDIA) LTD. Active B 10.00 INE448A01013 Auto Parts & Equipment Equity
Page 2...
--------------------------------------------------------------------------------
500030 AUTORIDFIN AUTORIDERS FINANCE LTD. Suspended T 10.00 INE450A01019 Finance (including NBFCs) Equity
500031 BAJAJELEC BAJAJ ELECTRICALS LTD.-$ Active A 2.00 INE193E01025 Household Appliances Equity
500032 BAJAJHIND Bajaj Hindusthan Sugar Limited Active B 1.00 INE306A01021 Sugar Equity
... and so on.

最新更新