如何在向下滚动页面后抓取数据,一个只加载10个项目的页面,然后当我们向下滚动时,它会添加项目



https://www.fynd.com/brands/我试图从这个页面中抓取数据,并在titlediv标记中获取所有数据,但向下滚动时会有许多title标记。最初,当我们加载页面时,它只会显示几个品牌,然后当我们手动向下滚动时,它会继续添加品牌。下面的代码是我正在使用的:

from bs4 import BeautifulSoup
from selenium import webdriver
import requests
import pandas as pd
import time
from selenium.common.exceptions import ElementClickInterceptedException
url = "https://www.fynd.com/brands/"
driver = webdriver.Chrome(executable_path ="D:\chromedriver_win32chromedriver.exe")
driver.get(url)
driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
soup = BeautifulSoup(driver.page_source,"html.parser")
title = soup.find_all('span', class_="ukt-title clrWhite")
all_titles = list()
for jelly in title:
all_titles.append(jelly.text.strip())

print(all_titles)

我想要的标题截图

尝试以下代码:

对于webelements的列表,这是一个无限while循环。确保有相同的压痕:

driver.get("https://www.fynd.com/brands/")
while True:
for item in driver.find_elements(By.XPATH, "//div[@data-cardtype='BRANDS']"):
ActionChains(driver).move_to_element(item).perform()
sleep(0.1)
print(item.text)

你可以通过删除0.1来优化这个脚本,我只是想有一个视觉体验。

有点不同,但可以工作

from selenium import webdriver
import chromedriver_autoinstaller
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import time
from selenium.webdriver.common.action_chains import ActionChains

chromedriver_autoinstaller.install() 
driver = webdriver.Chrome()
i=0
f = open('yeshivaLinks.txt','w')
driver.get("https://www.yeshiva.org.il/ask/filter")
print (len(driver.find_elements(By.XPATH, '//*[@id="myLessonsScroller"]/ul/content-preview/a')))
for a in driver.find_elements(By.XPATH, '//*[@id="myLessonsScroller"]/ul/content-preview/a'):
print(a.get_attribute('href'))
while True:
for a in driver.find_elements(By.XPATH, '//*[@id="myLessonsScroller"]/ul/content-preview/a'):
ActionChains(driver).move_to_element(a).perform()
print(a.get_attribute('href'))
f.write(a.get_attribute('href')+'n')
i= i+1
if(i == (len(driver.find_elements(By.XPATH, '//*[@id="myLessonsScroller"]/ul/content-preview/a')[i:])-15)):
for i in range(10):
lastHeight = driver.execute_script("return document.body.scrollHeight")
print(lastHeight)
driver.execute_script('window.scrollTo(0, document.body.scrollHeight-50);')
time.sleep(1)

我认为你应该试试这个。

driver.get("https://www.fynd.com/brands/")
while True:
for item in driver.find_elements(By.XPATH, "//div[@data-cardtype='BRANDS']"):
ActionChains(driver).move_to_element(item).perform()
sleep(0.5)
print(item.text) 
driver.get("https://www.makemytrip.com/hotels")
last_height = driver.execute_script("return document.body.scrollHeight")
itemTargetCount = 300
cost = []
while itemTargetCount > len(cost):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
elements = driver.find_elements(By.ID, 'hlistpg_hotel_shown_price')
textElements = []
for i in range(len(elements)):
textElements.append(elements[i].text)
cost = textElements
print(cost)
print(len(cost))

最新更新