使用Selenium(Python)导航到分页中的第n页,而不单击每一页



你们知道我可以在不浏览每一页的情况下进入这个网站的第n页(例如第100页(的方法吗?

以下是该网站的链接:https://www.sustainalytics.com/esg-ratings

(注意:只是一个例子,我不是在收集或出售这些数据(

如果有办法的话,我也可以通过chrome手动完成。

谢谢

要将页面中的所有数据都放入pandas数据帧,可以使用下一个示例:

import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.sustainalytics.com/sustapi/companyratings/getcompanyratings"
data = {
"industry": "",
"rating": "",
"filter": "",
"page": "1",
"pageSize": "10",
"resourcePackage": "Sustainalytics",
}
all_rows = []
for data["page"] in range(1, 3):  # <-- increase the range here
soup = BeautifulSoup(requests.post(url, data=data).content, "html.parser")
for s in soup.select(".company-row"):
all_rows.append(s.get_text(strip=True, separator="n").split("n"))
all_rows[-1].append(
"https://www.sustainalytics.com/esg-rating" + s.a["data-href"]
)
df = pd.DataFrame(
all_rows, columns=["Name", "Symbol", "ESG Risk Rating", "Text", "Link"]
)
print(df)

打印:

Name      Symbol ESG Risk Rating             Text                                                                                    Link
0                   1-800-Flowers.com Inc    NAS:FLWS            22.1  Medium ESG Risk              https://www.sustainalytics.com/esg-rating/1-800-flowers-com-inc/1007902331
1                                  1&1 AG     ETR:1U1            22.3  Medium ESG Risk                             https://www.sustainalytics.com/esg-rating/1-1-ag/1264207944
2                      10X Genomics, Inc.     NAS:TXG            22.6  Medium ESG Risk                   https://www.sustainalytics.com/esg-rating/10x-genomics-inc/2000259050
3                               111, Inc.      NAS:YI            28.7  Medium ESG Risk                            https://www.sustainalytics.com/esg-rating/111-inc/2006341104
4   17 Education & Technology Group, Inc.      NAS:YQ            27.0  Medium ESG Risk  https://www.sustainalytics.com/esg-rating/17-education-technology-group-inc/2008366023
5                  1Life Healthcare, Inc.    NAS:ONEM            24.6  Medium ESG Risk               https://www.sustainalytics.com/esg-rating/1life-healthcare-inc/2000165824
6                         1st Source Corp    NAS:SRCE            31.7    High ESG Risk                    https://www.sustainalytics.com/esg-rating/1st-source-corp/1008054650
7                       1stdibs.com, Inc.    NAS:DIBS            28.0  Medium ESG Risk                    https://www.sustainalytics.com/esg-rating/1stdibs-com-inc/2001638549
8                  22nd Century Group Inc    NAS:XXII            31.7    High ESG Risk             https://www.sustainalytics.com/esg-rating/22nd-century-group-inc/1032256453
9                         2i Rete Gas SpA           -            35.3    High ESG Risk                    https://www.sustainalytics.com/esg-rating/2i-rete-gas-spa/1008505575
10                               2U, Inc.    NAS:TWOU            19.8     Low ESG Risk                             https://www.sustainalytics.com/esg-rating/2u-inc/1068292762
11                     360 DigiTech, Inc.    NAS:QFIN            28.8  Medium ESG Risk                   https://www.sustainalytics.com/esg-rating/360-digitech-inc/2006584641
12          360 Security Technology, Inc.  SHG:601360            19.7     Low ESG Risk        https://www.sustainalytics.com/esg-rating/360-security-technology-inc/2005880516
13         361 Degrees International Ltd.    HKG:1361            19.1     Low ESG Risk      https://www.sustainalytics.com/esg-rating/361-degrees-international-ltd/1068063178
14                       3D Systems Corp.     NYS:DDD            25.8  Medium ESG Risk                    https://www.sustainalytics.com/esg-rating/3d-systems-corp/1008186648
15                           3i Group PLC     LON:III            11.6     Low ESG Risk                       https://www.sustainalytics.com/esg-rating/3i-group-plc/1007896757
16                                  3M Co     NYS:MMM            33.6    High ESG Risk                              https://www.sustainalytics.com/esg-rating/3m-co/1008167440
17                          3M India Ltd.  BOM:523395            23.8  Medium ESG Risk                       https://www.sustainalytics.com/esg-rating/3m-india-ltd/1008759597
18             3R Petroleum Óleo e Gás SA   BSP:RRRP3            56.0  Severe ESG Risk          https://www.sustainalytics.com/esg-rating/3r-petroleum-leo-e-g-s-sa/2008351136
19                              3SBio Inc    HKG:1530            27.1  Medium ESG Risk                          https://www.sustainalytics.com/esg-rating/3sbio-inc/1032042837

编辑:添加";链接";列到数据帧。

您可以这样做,使用requests/BeautifulSoup:

import requests
from bs4 import BeautifulSoup
data = {
'industry': '',
'rating': '',
'filter': '',
'page': 100, ### this is where you would select the specific page
'pageSize': 100,
'resourcePackage': 'Sustainalytics'
}
r = requests.post('https://www.sustainalytics.com/sustapi/companyratings/getcompanyratings', data=data)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.get_text(strip=True))

这将返回实际表的文本响应(它本身不是一个表,您需要进一步从html响应中隔离和选择元素-请参阅Andrej的优雅响应(:

Sera Prognostics, Inc.NAS:SERA23.9Medium ESG RiskSerba Dinamik Holdings Bhd.KLS:527944.6Severe ESG RiskSerco Group PLCLON:SRP19.1Low ESG RiskSercomm Corp.TAI:538824.3Medium ESG RiskSeres Therapeutics IncNAS:MCRB35.9High ESG RiskSeria Co. Ltd.TKS:278221.7Medium  [....]

您可以使用Selenium做类似的事情(Selenium本身并不是一个用于发送POST请求的工具,因此它将执行Javascript来获取数据并替换页面中的信息(:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")

webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
url='https://www.sustainalytics.com/esg-ratings'
browser.get(url)
print('sending xhr request..')
browser.execute_script('''
$.ajax({
type: 'POST',
url: 'https://www.sustainalytics.com/sustapi/companyratings/getcompanyratings',
data: 'industry=&rating=&filter=&page=100&pageSize=10&resourcePackage=Sustainalytics',
success: function(data){
$('#company_ratings').html(data);
}
});

''')
print('check page content, xhr response returned')

页面中的表将使用page 100信息进行更新,分页也将反映这一点。

最新更新