StaleElementReferenceException 错误后按下按钮与硒 + python.



我正在使用硒进行网络抓取项目。在这个项目中,我正在尝试从亚马逊的多个页面中抓取产品链接。例如,当我在亚马逊的搜索栏中键入笔记本电脑时,填充了多个产品,并且存在多个页面。我想从所有页面中抓取所有产品链接并将它们存储在列表中。

这是我到目前为止的代码

def scrape_pages_selenium(product, total_pages):
driver = webdriver.Chrome('./chromedriver')
url = f'https://www.amazon.com/s?k={product}&page=1&ref=nb_sb_noss'
driver.get(url)
links = driver.find_elements_by_class_name("a-size-mini")
product_links = []
for page in range(1, total_pages+1):
for link in links:
product_links.append(link.find_element_by_css_selector('a').get_attribute('href'))
print(len(product_links))
try:
next_page_button = driver.find_element_by_class_name("a-last")
next_page_button.click()
except:
continue
return product_links
product_links = scrape_pages_selenium('laptop', 7)

此代码在第一页上正常工作。next_page_button用于转到下一页。但是当代码尝试从第二页抓取链接时,我收到以下错误

StaleElementReferenceException            Traceback (most recent call last)
<ipython-input-50-09cc65b63734> in <module>
23     return product_links
24 
---> 25 product_links = scrape_pages_selenium('gatorade', 7)
26 
<ipython-input-50-09cc65b63734> in scrape_pages_selenium(product, total_pages)
12 
13         for link in links:
---> 14             product_links.append(link.find_element_by_css_selector('a').get_attribute('href'))
15 
16         print(len(product_links))
~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py in find_element_by_css_selector(self, css_selector)
428             element = element.find_element_by_css_selector('#foo')
429         """
--> 430         return self.find_element(by=By.CSS_SELECTOR, value=css_selector)
431 
432     def find_elements_by_css_selector(self, css_selector):
~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py in find_element(self, by, value)
657 
658         return self._execute(Command.FIND_CHILD_ELEMENT,
--> 659                              {"using": by, "value": value})['value']
660 
661     def find_elements(self, by=By.ID, value=None):
~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py in _execute(self, command, params)
631             params = {}
632         params['id'] = self._id
--> 633         return self._parent.execute(command, params)
634 
635     def find_element(self, by=By.ID, value=None):
~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py in execute(self, driver_command, params)
319         response = self.command_executor.execute(driver_command, params)
320         if response:
--> 321             self.error_handler.check_response(response)
322             response['value'] = self._unwrap_value(
323                 response.get('value', None))
~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py in check_response(self, response)
240                 alert_text = value['alert'].get('text')
241             raise exception_class(message, screen, stacktrace, alert_text)
--> 242         raise exception_class(message, screen, stacktrace)
243 
244     def _value_or_default(self, obj, key, default):
StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: chrome=83.0.4103.61)

我不确定我哪里出错了。

在循环内移动links = driver.find_elements_by_class_name("a-size-mini")。这是因为当您移动到下一页时,links集合不再有效

find_elements_by_class_name为您提供了现有内容和当前页面的快照,但当您移动到下一页时,DOM 元素的快照不再有效

最新更新