很抱歉这里的新手问题,但我正在尝试通过在Yelp的 Eat24.com 网站上工作来学习网络抓取。我能够 1( 让司机 eat24.com,2( 选择接送,3( 搜索位置,4( 单击第一个菜单和 5( 收集菜单项。但是,我无法返回原始餐厅列表并选择列表中的下一个菜单。这是我的代码:
from selenium import webdriver
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
#go to eat24, type in zip code 10007, choose pickup and click search
driver.get("https://new-york.eat24hours.com/restaurants/index.php")
search_area = driver.find_element_by_name("address_auto_complete")
search_area.send_keys("10007")
pickup_element = driver.find_element_by_xpath("//[@id='search_form']/div/table/tbody/tr/td[2]")
pickup_element.click()
search_button = driver.find_element_by_xpath("//*[@id='search_form']/div/table/tbody/tr/td[3]/button")
search_button.click()
#scroll up and down on page to load more of 'infinity' list
for i in range(0,3):
driver.execute_script("window.scrollTo(0,
document.body.scrollHeight);")
driver.execute_script("window.scrollTo(0,0);")
time.sleep(1)
#find menu buttons
menus_elements = driver.find_elements_by_xpath('//*[@title="View Menu"]')
#menus_element = driver.find_element_by_xpath('//*[@title="View Menu"]')
#menus_element.click()
#Problem area: Trying to iterate over menu buttons and collect menu items + prices from each. It goes to the first menu and pulls the prices/menu items, but then when it goes back to first page it says 'stale element reference' and doesn't click the next menu item
for i in range(0, len(menus_elements)):
if menus_elements[i].is_displayed():
menus_elements[i].click()
#find menu items
menu_items = driver.find_elements_by_class_name("cpa")
menus = [x.text for x in menu_items]
#find menu prices
menu_prices = driver.find_elements_by_class_name('item_price')
menu_prices = [x.text for x in menu_prices]
#pair menu items and prices
for menu, menu_price in zip(menus, menu_prices):
print(menu + ': ' + menu_price)
driver.execute_script("window.history.go(-1)")
driver.implicitly_wait(20)
问题出在这里的末尾,它转到第一个菜单并抓取项目/价格,但是当它返回页面时,它不会选择第二个菜单并执行相同的操作。为什么?感谢您的任何建议!!
无需单击每个"查看菜单"按钮,抓取菜单页面并返回结果页面,您可以获取链接列表,然后逐个抓取每个菜单页面:
menu_urls = [page.get_attribute('href') for page in driver.find_elements_by_xpath('//*[@title="View Menu"]')]
for url in menu_urls:
driver.get(url)
menu_items = driver.find_elements_by_class_name("cpa")
menus = [x.text for x in menu_items]
...