Python Selenium - 提取所有URL的In Table并迭代,直到下一个按钮消失



我正在尝试提取所有URL并迭代按下下一个按钮的位置,直到没有下一个按钮。如果可能的话,我想打开每个 URL。请为我指出正确的方向。

您需要按下搜索按钮的网站在这里

链接到需要提取的 URL 表

from selenium import webdriver
from selenium.webdriver.common.by import By
driver=webdriver.Chrome(executable_path=r"C:Usersmatt_DocumentsPython ScriptsSeleniumchromedriver.exe")

driver.get("https://publicaccess.aberdeencity.gov.uk/online-applications/search.do?action=monthlyList")
driver.find_element_by_xpath("/html/body/div/div/div[3]/div[3]/div/form/fieldset/div[5]/input[2]").click()
test = driver.find_elements(By.TAG_NAME,"a")

print(test)

这是您要查找的示例


from bs4 import BeautifulSoup as Soup
from selenium import webdriver
import pandas as pd 
import time

driver = webdriver.Chrome()
driver.get("https://monerobenchmarks.info/")
page = Soup(driver.page_source, features='html.parser')
final_list = []
def parsh_table():
table = page.find('table')
table_rows = table.find_all('tr')
for tr in table_rows:
td = tr.find_all('td')
row = [i.text for i in td]
final_list.extend(row)
def next_bu():
next_button = driver.find_element_by_xpath('//*[@id="cpu_next"]')
next_button.click()
# put range of pages
for _ in range(1,2):
parsh_table()
time.sleep(2)
next_bu()

print(final_list)

您可以使用如下简单逻辑检查元素是否存在:

if len(driver.find_elements_by_css_selector('.next')) > 0:

尝试以下代码:

driver.get('https://publicaccess.aberdeencity.gov.uk/online-applications/search.do?action=monthlyList')
search_btn = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, '.button.primary')))
search_btn.click()
condition = True
while condition:
links = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'li.searchresult a')))
for link in links:
print(link.get_attribute('href'))
if len(driver.find_elements_by_css_selector('.next')) > 0:
driver.find_element_by_css_selector('.next').click()
else:
condition = False
driver.quit()

以下导入:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

你去吧

from selenium import webdriver
driver = webdriver.Chrome(executable_path=r"C:Usersmatt_DocumentsPython ScriptsSeleniumchromedriver.exe")
driver.get("https://publicaccess.aberdeencity.gov.uk/online-applications/search.do?action=monthlyList")
driver.find_element_by_css_selector("input[value='Search']").click()
def parse():
links = driver.find_elements_by_xpath('//*[@id="searchresults"]/li/a')
for link in links:
print(link.text, link.get_attribute("href"))
try:
driver.find_element_by_class_name('next').click()
parse()
except:
print('complete')
parse()

最新更新