我正在尝试从多个页面进行web清理,我的代码似乎只适用于第一页,当我使用循环进行web清理时,例如前5页,我会收到以下错误:超时异常:消息:Stacktrace:回溯:
我的代码在下面
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import pandas as pd
from bs4 import BeautifulSoup
import requests as r
import time
from selenium.webdriver.support.ui import Select
PATH="chromedriver.exe"
driver=webdriver.Chrome(PATH)
_list=[]
for page_num in range(1,3):
#print("----")
url=f"https://valuebuds.com/pages/search-results-page?tab=products&page={page_num}"
driver.get(url)
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "select#year_field")))).select_by_visible_text('1999')
driver.find_element_by_class_name("agree").click()
title=driver.find_elements_by_class_name("snize-overhidden")
for j in title:
Pro=j.find_element_by_class_name("snize-title").text
Price=j.find_element_by_class_name("snize-price-list").text
Desc=j.find_element_by_class_name("snize-description").text
prec_item={
"Product":Pro,
"Price":Price,
"Description":Desc
}
_list.append(prec_item)
df = pd.DataFrame(_list)
df.to_csv("Value Buds HTML Pricing.csv")
print("saved to file.")
请告知!提前感谢
代码块
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "select#year_field")))).select_by_visible_text('1999')
driver.find_element_by_class_name("agree").click()
当您第一次登录主页时,它是相关的
选择年份并单击Agree
按钮后,您将能够看到所显示结果的所有页面,而无需再次选择该年份
所以,您的代码可能是这样的:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import pandas as pd
from bs4 import BeautifulSoup
import requests as r
import time
from selenium.webdriver.support.ui import Select
PATH="chromedriver.exe"
driver=webdriver.Chrome(PATH)
_list=[]
for page_num in range(1,3):
#print("----")
url=f"https://valuebuds.com/pages/search-results-page?tab=products&page={page_num}"
driver.get(url)
if page_num == 1:
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "select#year_field")))).select_by_visible_text('1999')
driver.find_element_by_class_name("agree").click()
else:
time.sleep(2)
title=driver.find_elements_by_class_name("snize-overhidden")
for j in title:
Pro=j.find_element_by_class_name("snize-title").text
Price=j.find_element_by_class_name("snize-price-list").text
Desc=j.find_element_by_class_name("snize-description").text
prec_item={
"Product":Pro,
"Price":Price,
"Description":Desc
}
_list.append(prec_item)
df = pd.DataFrame(_list)
df.to_csv("Value Buds HTML Pricing.csv")
print("saved to file.")
我为非第一次迭代添加了一个延迟,以便在抓取页面数据之前加载页面
如果您在那里使用Expected Conditions显式等待,我会更好
我不知道在那里使用什么条件,由你决定。