处理加载缓慢的网页,从我的脚本中删除硬编码延迟



我用python编写了一个与Selenium相关的脚本,用于解析处理延迟加载方法的网页中的一些名称,网页在每次滚动到底部时都会显示其内容。我的脚本没有错误。但是,我无法解决的唯一问题是从脚本中删除硬编码延迟。我真的找不到任何关于如何使用explicit wait而不是hardcoded delay保持逻辑(在脚本中应用(以使其更有效率的想法。提前感谢任何帮助。

网页链接

这是我到目前为止尝试过的(工作之一(:

import time
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("find_the_link_above")
last_len = len(driver.find_elements_by_class_name("listing__name--link"))
new_len = last_len
while True:
last_len = new_len
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3) ##I wish to kick out this harcoded delay and use explicit wait in place
items = driver.find_elements_by_class_name("listing__name--link")
new_len = len(items)
if last_len == new_len:break
for item in items:
print(item.text)
driver.quit()

这是实现 ExplicitWait 的方式:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome()
driver.get("https://www.yellowpages.ca/search/si/1/coffee/all%20states")
last_len = len(driver.find_elements_by_class_name("listing__name--link"))
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
try:
wait(driver, 3).until(lambda driver: len(driver.find_elements_by_class_name("listing__name--link")) > last_len)
items = driver.find_elements_by_class_name("listing__name--link")
last_len = len(items)
except TimeoutException:
break
for item in items:
print(item.text)
driver.quit()

这应该允许您向下滚动并等待最多 3 秒(如果需要,请增加超时(,直到元素数量在循环中增加或中断while循环,以防数字保持不变

要解析网页中的名称,您可以使用以下代码块:

  • 代码块

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    items = []
    options = Options()
    options.add_argument("start-maximized")
    options.add_argument("disable-infobars")
    options.add_argument("--disable-extensions")
    options.add_argument("--no-sandbox")
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:pathtochromedriver.exe')
    driver.get('https://www.yellowpages.ca/search/si/1/coffee/all%20states')
    items=driver.find_elements_by_css_selector("h3[itemprop='name']>a.listing__name--link")
    while(driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")):
    items.append(driver.find_elements_by_css_selector("h3[itemprop='name']>a.listing__name--link"))
    for item in items:
    print(item.text)
    
  • 控制台输出

    Tim Hortons
    Downtown Expresso Café
    Tim Hortons
    Tim Hortons
    Tim Hortons
    Starbucks
    Tim Hortons
    Tim Hortons
    Tim Hortons
    Tim Hortons
    Tim Hortons
    Tim Hortons
    Tim Hortons
    Starbucks
    Tim Hortons
    Tim Hortons
    Budokan
    Anchor Cafe House
    Starbucks
    Tim Hortons
    Tim Hortons
    Starbucks
    Tim Hortons
    Starbucks
    Tim Hortons
    Tim Hortons
    Colonial Coffee Co Ltd
    Personal Service Coffee
    Tim Hortons
    Suzie's Grill Cafe Inc
    Loaves N Fishes Catering & Cafe
    Tim Hortons
    Tim Hortons
    Tim Hortons
    Tim Hortons
    Elizabeth Houte Coiffure
    The Grind House Cafe
    Tim Hortons
    Black Bench Coffee Roasters
    Tim Hortons
    

相关内容

  • 没有找到相关文章

最新更新