我是硒的新手,正在尝试一个需要从页面中抓取URL的项目。
来源:-https://www.autofurnish.com/audi-car-accessories
我想通过抓取数据来获取这些产品的URL。我能够完成它,但面临滚动部分的问题。我需要抓取这个页面上所有产品的所有URL。这是一个巨大的页面,有很多结果。
我尝试过的:-
1.
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
我试过这个代码,但它只是向下滚动到最后,所有的产品都没有加载。
2.
data = driver.find_elements(By.XPATH,"//h2[@class='product-title']//a")
for i in data:
driver.execute_script("arguments[0].scrollIntoView();", i)
items=[]last_height=driver.execute_script("return document.body.sollHeight"(item_targetcount=1000而item_targetcount>len(项(:driver.execute_script("window.sollTo(0,document.body.sollHeight("(time.sleep(2(#给网站加载时间new_height=driver.execute_script("return document.body.sollHeight"(如果new_height==last_height:打破last_height=新_height
尝试从以下方面获得帮助:-如何在Python Selenium中逐步向下滚动使用网络驱动程序滚动到元素?尝试观看一些youtube视频,但仍无法修复此问题。
我收集其他细节的主要代码是:-
prod_details = []
for i in models:
driver.find_element(By.XPATH,"//span[@aria-labelledby='select2-brand-container']").click()
time.sleep(2)
driver.find_element(By.XPATH,"//input[@class='select2-search__field']").send_keys(i)
driver.find_element(By.XPATH,"//input[@class='select2-search__field']").send_keys(Keys.ENTER)
driver.find_element(By.XPATH,"//div[@class='btnred sbv-link sbv-inactive']").click()
time.sleep(3)
prod = driver.find_elements(By.XPATH,"//h2[@class='product-title']//a")
for i in prod:
prod_details.append(i.get_attribute("href"))
driver.get('https://www.autofurnish.com/')
time.sleep(2)
仍然无法完全加载页面并获取所有输出。
这是一个非常棘手的问题。。。我在试图让它发挥作用时遇到了几个意想不到的问题。
主要问题是等待加载微调器并将其保留在屏幕上。我最初尝试像你一样滚动到页面底部,这将页面置于加载新产品部分的无限循环中,因为页脚太大了,加载微调器在可见页面上方(至少对我来说是这样(。我通过滚动到最后一个可见的产品来解决这个问题,这个产品足以触发下一个部分加载,但不会太低,以至于进入无限加载模式。
在大多数情况下,当涉及加载微调器时,您希望等待它变为可见,然后再变为不可见。这可以防止出现错误的时间情况,并且是等待新产品加载的最可靠方式。
基本流程是
- 加载页面
- 开始循环
- 抓取所有产品A标签
- 使用JS将页面向下滚动到最后一个A标记
- 等待加载微调器变为可见,然后变为不可见
- 如果没有装载更多的产品或达到某个最大产品数量,则退出循环
- 写下产品总数
- 编写产品URL
代码
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
...
# may need to adjust the timeout based on your experience... the site is really slow for me
wait = WebDriverWait(driver, 60)
new_count = 0
old_count = 0
while True:
old_count = new_count
products = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h2.product-title > a"))
new_count = len(products)
# scroll down to last product to trigger the loading spinner
driver.execute_script("arguments[0].scrollIntoView();", products[len(products) - 1])
# wait for loading spinner to appear and then disappear
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.infinite-scroll-loader")))
wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, "div.infinite-scroll-loader")))
# if the count didn't change, we've loaded all products on the page
# I put a max of 50 products to load as a demo. You can adjust higher as needed but you should put something reasonably sized here to prevent the script from running for an hour
if new_count == old_count or new_count > 50
break
# print results
print(len(products))
for product in products:
print(product.get_attribute("href"))
要从元素中提取href
属性的值,可以使用列表理解,也可以使用以下定位策略之一:
-
使用CSS_SELECTOR:
driver.get('https://www.autofurnish.com/audi-car-accessories#/pageSize=32&viewMode=grid&orderBy=0') print([my_elem.get_attribute("href") for my_elem in driver.find_elements(By.CSS_SELECTOR, "h2.product-title a")]) driver.quit()
-
使用XPATH:
driver.get('https://www.autofurnish.com/audi-car-accessories#/pageSize=32&viewMode=grid&orderBy=0') print([my_elem.get_attribute("href") for my_elem in driver.find_elements(By.XPATH, "//h2[@class='product-title']//a")]) driver.quit()
-
控制台输出:
['https://www.autofurnish.com/combo-of-7d-premium-car-pillow-neck-rest-hecta-6841-back-cushion-hecta-6851-each-set-of-two-beige', 'https://www.autofurnish.com/combo-of-7d-premium-car-pillow-neck-rest-hecta-6840-back-cushion-hecta-6850-each-set-of-two-black', 'https://www.autofurnish.com/combo-of-7d-premium-car-pillow-neck-rest-hecta-6843-back-cushion-hecta-6853-each-set-of-two-coffee', 'https://www.autofurnish.com/combo-of-7d-premium-car-pillow-neck-rest-hecta-6842-back-cushion-hecta-6852-each-set-of-two-tan', 'https://www.autofurnish.com/universal-2d-premium-leather-car-foot-mats-for-2-rows-beige', 'https://www.autofurnish.com/universal-2d-premium-leather-car-foot-mats-for-2-rows-black', 'https://www.autofurnish.com/universal-2d-premium-leather-car-foot-mats-for-2-rows-coffee', 'https://www.autofurnish.com/universal-2d-premium-leather-car-foot-mats-for-2-rows-tan', 'https://www.autofurnish.com/autofurnish-3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-beige', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-black', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-coffee', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-set-of-two-beige', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-set-of-two-black', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-set-of-two-brown', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-set-of-two', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-tan', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-beige', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-black', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-coffee', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-set-of-2-beige', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-set-of-2-black', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-set-of-2-coffee', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-set-of-2-tan', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-tan', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a4-2021-beige', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a4-2021-black', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a4-2021-coffee', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a4-2021-tan', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a6-2020-beige', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a6-2020-black', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a6-2020-coffee', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a6-2020-tan']