我用python编写了一个与Selenium相关的脚本,用于解析处理延迟加载方法的网页中的一些名称,网页在每次滚动到底部时都会显示其内容。我的脚本没有错误。但是,我无法解决的唯一问题是从脚本中删除硬编码延迟。我真的找不到任何关于如何使用explicit wait
而不是hardcoded delay
保持逻辑(在脚本中应用(以使其更有效率的想法。提前感谢任何帮助。
网页链接
这是我到目前为止尝试过的(工作之一(:
import time
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("find_the_link_above")
last_len = len(driver.find_elements_by_class_name("listing__name--link"))
new_len = last_len
while True:
last_len = new_len
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3) ##I wish to kick out this harcoded delay and use explicit wait in place
items = driver.find_elements_by_class_name("listing__name--link")
new_len = len(items)
if last_len == new_len:break
for item in items:
print(item.text)
driver.quit()
这是实现 ExplicitWait 的方式:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome()
driver.get("https://www.yellowpages.ca/search/si/1/coffee/all%20states")
last_len = len(driver.find_elements_by_class_name("listing__name--link"))
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
try:
wait(driver, 3).until(lambda driver: len(driver.find_elements_by_class_name("listing__name--link")) > last_len)
items = driver.find_elements_by_class_name("listing__name--link")
last_len = len(items)
except TimeoutException:
break
for item in items:
print(item.text)
driver.quit()
这应该允许您向下滚动并等待最多 3 秒(如果需要,请增加超时(,直到元素数量在循环中增加或中断while
循环,以防数字保持不变
要解析网页中的名称,您可以使用以下代码块:
-
代码块:
from selenium import webdriver from selenium.webdriver.chrome.options import Options items = [] options = Options() options.add_argument("start-maximized") options.add_argument("disable-infobars") options.add_argument("--disable-extensions") options.add_argument("--no-sandbox") driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:pathtochromedriver.exe') driver.get('https://www.yellowpages.ca/search/si/1/coffee/all%20states') items=driver.find_elements_by_css_selector("h3[itemprop='name']>a.listing__name--link") while(driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")): items.append(driver.find_elements_by_css_selector("h3[itemprop='name']>a.listing__name--link")) for item in items: print(item.text)
-
控制台输出:
Tim Hortons Downtown Expresso Café Tim Hortons Tim Hortons Tim Hortons Starbucks Tim Hortons Tim Hortons Tim Hortons Tim Hortons Tim Hortons Tim Hortons Tim Hortons Starbucks Tim Hortons Tim Hortons Budokan Anchor Cafe House Starbucks Tim Hortons Tim Hortons Starbucks Tim Hortons Starbucks Tim Hortons Tim Hortons Colonial Coffee Co Ltd Personal Service Coffee Tim Hortons Suzie's Grill Cafe Inc Loaves N Fishes Catering & Cafe Tim Hortons Tim Hortons Tim Hortons Tim Hortons Elizabeth Houte Coiffure The Grind House Cafe Tim Hortons Black Bench Coffee Roasters Tim Hortons