我正试图从这个网站上抓取数据:
https://www.shanghairanking.com/rankings/grsssd/2021
起初,熊猫会把我带出大门,我可以刮桌子,但我很难处理下拉菜单。我想选择总分框旁边的选项,即PUB、CIT等。当我检查元素时,它看起来可能像Javascript,而对这些选项进行交互的常用方法不起作用。我尝试过Beutifal汤和最近的Selenium来手工选择下拉菜单。这适用于默认的表数据''
import time
import pandas as pd
from selenium import webdriver
from selenium.webdriver.support.ui import Select
driver = webdriver.Chrome('/Users/martinbell/Downloads/chromedriver')
driver.get('https://www.shanghairanking.com/rankings/grsssd/2021')
submit = driver.find_element_by_xpath("//input[@value='CIT']").click()
"我哪儿也不去。
您的代码将无法工作,因为您必须首先单击打开的下拉列表,然后遍历下拉列表中的选项。这是重构后的代码。
请注意,我使用time.sleep
是为了即时的目的,但为了获得健壮的代码和良好的实践,请使用显式等待,如WebdriverWait
driver.get('https://www.shanghairanking.com/rankings/grsssd/2021')
time.sleep(10)
driver.find_element(By.XPATH, "(//*[@class='inputWrapper'])[3]").click()
#The below commented code loops through all the dropdown options and performs actions.
# opt_ele = driver.find_elements(By.XPATH, "(//*[@class='rank-select'])[2]//*[@class='options']//li")
# for ele in opt_ele:
# print(ele.text)
# ele.click()
# print('perform your actions here')
# driver.find_element(By.XPATH, "(//*[@class='inputWrapper'])[3]").click()
# If you do not want to loop through but just want to select only CIT, here is the line:
driver.find_element(By.XPATH, "(//*[@class='rank-select'])[2]//*[@class='options']//li[text()='CIT']").click()