由于元素标识的动态变化,无法抓取数据



嘿,伙计们,我正试图刮掉Zomato的餐厅,这些餐厅的评级在4以上,但https://www.zomato.com/pune/order-food-online?delivery_subzone=1165但它的类名或所有东西在接下来的几个元素后改变

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import json
driver=webdriver.Chrome(executable_path='./chromedriver.exe')
driver.get('https://www.zomato.com/pune/order-food-online?delivery_subzone=1165')
rating=WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.XPATH, '//p[@class="sc-1hez2tp-0 sc-lhdg1m-2 hDJwRc"]'))
)
for item in rating:
stars=item.text
if stars > '4.0':
title=WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, "//p[@class='sc-1hez2tp-0 sc-izFuNb jbErXF']"))
)
time.sleep(10)
driver.close()

请大家,我正在做硒

  1. 进入页面
  2. 使用上面提供的过滤器过滤出4.0+评级的餐厅-使用xpath//div[contains(text(),'Rating: 4.0+')](使用click()方法)。
  3. 所有餐厅的卡片都有Restaurant Card的图像。所以你可以使用css选择器img[alt='Restaurant Card']来获得过滤后出现的所有卡片,并将它们保存在一些计数变量中。
  4. 当你继续滚动时,你需要不断添加这个计数变量。

编辑:这是为您提供的整个脚本-它给出了117餐厅的数量

import time
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from urllib.parse import urljoin
##### Web scrapper for infinite scrolling page #####
driver = webdriver.Chrome(executable_path=r"path_to-chromedriver")
driver.get("https://www.zomato.com/pune/delivery-in-budhwar-peth")
time.sleep(10)  # Allow 2 seconds for the web page to open
driver.find_element_by_xpath("//div[contains(text(),'Rating: 4.0+')]").click()
scroll_pause_time = 1 # You can set your own pause time. My laptop is a bit slow so I use 1 sec
screen_height = driver.execute_script("return window.screen.height;")   # get the screen height of the web
i = 1
count=0
while True:
# scroll one screen height each time
driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))
i += 1
time.sleep(scroll_pause_time)
# update scroll height each time after scrolled, as the scroll height can change after we scrolled the page
scroll_height = driver.execute_script("return document.body.scrollHeight;")
# Break the loop when the height we need to scroll to is larger than the total scroll height
if (screen_height) * i > scroll_height:
break
soup = BeautifulSoup(driver.page_source, "html.parser")
for img in soup.find_all('img',alt='Restaurant Card'):
count+=1
print('Count of all rests is',count)
driver.quit()

最新更新