超时异常:消息:堆栈跟踪:回溯:



我正在尝试从多个页面进行web清理,我的代码似乎只适用于第一页,当我使用循环进行web清理时,例如前5页,我会收到以下错误:超时异常:消息:Stacktrace:回溯:

我的代码在下面


from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import pandas as pd
from bs4 import BeautifulSoup
import requests as r

import time
from selenium.webdriver.support.ui import Select
PATH="chromedriver.exe"

driver=webdriver.Chrome(PATH)
_list=[]
for page_num in range(1,3):
#print("----")
url=f"https://valuebuds.com/pages/search-results-page?tab=products&page={page_num}"
driver.get(url)

Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "select#year_field")))).select_by_visible_text('1999')
driver.find_element_by_class_name("agree").click()

title=driver.find_elements_by_class_name("snize-overhidden")

for j in title:
Pro=j.find_element_by_class_name("snize-title").text
Price=j.find_element_by_class_name("snize-price-list").text
Desc=j.find_element_by_class_name("snize-description").text

prec_item={
"Product":Pro,
"Price":Price,
"Description":Desc
}
_list.append(prec_item)

df = pd.DataFrame(_list)
df.to_csv("Value Buds HTML Pricing.csv")
print("saved to file.")

请告知!提前感谢

代码块

Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "select#year_field")))).select_by_visible_text('1999')
driver.find_element_by_class_name("agree").click()

当您第一次登录主页时,它是相关的
选择年份并单击Agree按钮后,您将能够看到所显示结果的所有页面,而无需再次选择该年份
所以,您的代码可能是这样的:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import pandas as pd
from bs4 import BeautifulSoup
import requests as r

import time
from selenium.webdriver.support.ui import Select
PATH="chromedriver.exe"

driver=webdriver.Chrome(PATH)
_list=[]
for page_num in range(1,3):
#print("----")
url=f"https://valuebuds.com/pages/search-results-page?tab=products&page={page_num}"
driver.get(url)
if page_num == 1:

Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "select#year_field")))).select_by_visible_text('1999')
driver.find_element_by_class_name("agree").click()
else:        
time.sleep(2)

title=driver.find_elements_by_class_name("snize-overhidden")

for j in title:
Pro=j.find_element_by_class_name("snize-title").text
Price=j.find_element_by_class_name("snize-price-list").text
Desc=j.find_element_by_class_name("snize-description").text

prec_item={
"Product":Pro,
"Price":Price,
"Description":Desc
}
_list.append(prec_item)

df = pd.DataFrame(_list)
df.to_csv("Value Buds HTML Pricing.csv")
print("saved to file.")

我为非第一次迭代添加了一个延迟,以便在抓取页面数据之前加载页面
如果您在那里使用Expected Conditions显式等待,我会更好
我不知道在那里使用什么条件,由你决定。

最新更新