硒:刮一页直到所有产品都加载



我是硒的新手,正在尝试一个需要从页面中抓取URL的项目。

来源:-https://www.autofurnish.com/audi-car-accessories

我想通过抓取数据来获取这些产品的URL。我能够完成它,但面临滚动部分的问题。我需要抓取这个页面上所有产品的所有URL。这是一个巨大的页面,有很多结果。

我尝试过的:-

1.

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

我试过这个代码,但它只是向下滚动到最后,所有的产品都没有加载。

2.

data = driver.find_elements(By.XPATH,"//h2[@class='product-title']//a")
for i in data:
driver.execute_script("arguments[0].scrollIntoView();", i)
  1. items=[]last_height=driver.execute_script("return document.body.sollHeight"(item_targetcount=1000而item_targetcount>len(项(:driver.execute_script("window.sollTo(0,document.body.sollHeight("(time.sleep(2(#给网站加载时间new_height=driver.execute_script("return document.body.sollHeight"(如果new_height==last_height:打破last_height=新_height

尝试从以下方面获得帮助:-如何在Python Selenium中逐步向下滚动使用网络驱动程序滚动到元素?尝试观看一些youtube视频,但仍无法修复此问题。

我收集其他细节的主要代码是:-

prod_details = []
for i in models:
driver.find_element(By.XPATH,"//span[@aria-labelledby='select2-brand-container']").click()
time.sleep(2)
driver.find_element(By.XPATH,"//input[@class='select2-search__field']").send_keys(i)
driver.find_element(By.XPATH,"//input[@class='select2-search__field']").send_keys(Keys.ENTER)
driver.find_element(By.XPATH,"//div[@class='btnred sbv-link sbv-inactive']").click()
time.sleep(3)
prod = driver.find_elements(By.XPATH,"//h2[@class='product-title']//a")
for i in prod:
prod_details.append(i.get_attribute("href"))
driver.get('https://www.autofurnish.com/')
time.sleep(2)

仍然无法完全加载页面并获取所有输出。

这是一个非常棘手的问题。。。我在试图让它发挥作用时遇到了几个意想不到的问题。

主要问题是等待加载微调器并将其保留在屏幕上。我最初尝试像你一样滚动到页面底部,这将页面置于加载新产品部分的无限循环中,因为页脚太大了,加载微调器在可见页面上方(至少对我来说是这样(。我通过滚动到最后一个可见的产品来解决这个问题,这个产品足以触发下一个部分加载,但不会太低,以至于进入无限加载模式。

在大多数情况下,当涉及加载微调器时,您希望等待它变为可见,然后再变为不可见。这可以防止出现错误的时间情况,并且是等待新产品加载的最可靠方式。

基本流程是

  1. 加载页面
  2. 开始循环
    1. 抓取所有产品A标签
    2. 使用JS将页面向下滚动到最后一个A标记
    3. 等待加载微调器变为可见,然后变为不可见
    4. 如果没有装载更多的产品或达到某个最大产品数量,则退出循环
  3. 写下产品总数
  4. 编写产品URL

代码

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
...
# may need to adjust the timeout based on your experience... the site is really slow for me
wait = WebDriverWait(driver, 60)
new_count = 0
old_count = 0
while True:
old_count = new_count
products = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h2.product-title > a"))
new_count = len(products)
# scroll down to last product to trigger the loading spinner
driver.execute_script("arguments[0].scrollIntoView();", products[len(products) - 1])
# wait for loading spinner to appear and then disappear
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.infinite-scroll-loader")))
wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, "div.infinite-scroll-loader")))
# if the count didn't change, we've loaded all products on the page
# I put a max of 50 products to load as a demo. You can adjust higher as needed but you should put something reasonably sized here to prevent the script from running for an hour
if new_count == old_count or new_count > 50
break
# print results
print(len(products))
for product in products:
print(product.get_attribute("href"))

要从元素中提取href属性的值,可以使用列表理解,也可以使用以下定位策略之一:

  • 使用CSS_SELECTOR

    driver.get('https://www.autofurnish.com/audi-car-accessories#/pageSize=32&viewMode=grid&orderBy=0')
    print([my_elem.get_attribute("href") for my_elem in driver.find_elements(By.CSS_SELECTOR, "h2.product-title a")])
    driver.quit()
    
  • 使用XPATH:

    driver.get('https://www.autofurnish.com/audi-car-accessories#/pageSize=32&viewMode=grid&orderBy=0')
    print([my_elem.get_attribute("href") for my_elem in driver.find_elements(By.XPATH, "//h2[@class='product-title']//a")])
    driver.quit()
    
  • 控制台输出:

    ['https://www.autofurnish.com/combo-of-7d-premium-car-pillow-neck-rest-hecta-6841-back-cushion-hecta-6851-each-set-of-two-beige', 'https://www.autofurnish.com/combo-of-7d-premium-car-pillow-neck-rest-hecta-6840-back-cushion-hecta-6850-each-set-of-two-black', 'https://www.autofurnish.com/combo-of-7d-premium-car-pillow-neck-rest-hecta-6843-back-cushion-hecta-6853-each-set-of-two-coffee', 'https://www.autofurnish.com/combo-of-7d-premium-car-pillow-neck-rest-hecta-6842-back-cushion-hecta-6852-each-set-of-two-tan', 'https://www.autofurnish.com/universal-2d-premium-leather-car-foot-mats-for-2-rows-beige', 'https://www.autofurnish.com/universal-2d-premium-leather-car-foot-mats-for-2-rows-black', 'https://www.autofurnish.com/universal-2d-premium-leather-car-foot-mats-for-2-rows-coffee', 'https://www.autofurnish.com/universal-2d-premium-leather-car-foot-mats-for-2-rows-tan', 'https://www.autofurnish.com/autofurnish-3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-beige', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-black', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-coffee', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-set-of-two-beige', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-set-of-two-black', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-set-of-two-brown', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-set-of-two', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-tan', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-beige', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-black', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-coffee', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-set-of-2-beige', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-set-of-2-black', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-set-of-2-coffee', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-set-of-2-tan', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-tan', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a4-2021-beige', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a4-2021-black', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a4-2021-coffee', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a4-2021-tan', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a6-2020-beige', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a6-2020-black', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a6-2020-coffee', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a6-2020-tan']
    

最新更新