如何在循环槽元素(Selenium)时解决StaleElementReferenceException



我正在尝试在网上抓取一个网站以获取有关足球比赛的信息。因此,我在Python中使用Selenium库。

我将所有需要的匹配项中的可点击 html 元素存储在一个名为"completed_matches"的列表中。我创建了一个 for 循环,它通过所有这些可点击的 html 元素进行迭代。在循环中,我单击当前的 html 元素并打印新的 URL。代码如下所示:

from selenium import webdriver
import selenium
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Chrome(r"C:UsersMartDownloadschromedriver_win32_2chromedriver.exe")
url = "https://footystats.org/spain/la-liga/matches"
driver.get(url)
completed_matches = driver.find_elements_by_xpath("""//*[@id="matches-list"]/div[@class='full-matches-table mt2e ' or @class='full-matches-table mt1e ']/div/div[2]/table[@class='matches-table inactive-matches']/tbody/tr[*]/td[3]/a[1]/span""");
print(len(completed_matches))
for match in completed_matches:
match.click()
print("Current driver URL: " + driver.current_url)

输出如下所示:

159
Current driver URL: https://footystats.org/spain/fc-barcelona-vs-real-club-deportivo-mallorca-h2h-stats#632514
---------------------------------------------------------------------------
StaleElementReferenceException            Traceback (most recent call last)
<ipython-input-3-da5851d767a8> in <module>
4 print(len(completed_matches))
5 for match in completed_matches:
----> 6         match.click()
7         print("Current driver URL: " + driver.current_url)
~Anaconda3libsite-packagesseleniumwebdriverremotewebelement.py in click(self)
78     def click(self):
79         """Clicks the element."""
---> 80         self._execute(Command.CLICK_ELEMENT)
81 
82     def submit(self):
~Anaconda3libsite-packagesseleniumwebdriverremotewebelement.py in _execute(self, command, params)
631             params = {}
632         params['id'] = self._id
--> 633         return self._parent.execute(command, params)
634 
635     def find_element(self, by=By.ID, value=None):
~Anaconda3libsite-packagesseleniumwebdriverremotewebdriver.py in execute(self, driver_command, params)
319         response = self.command_executor.execute(driver_command, params)
320         if response:
--> 321             self.error_handler.check_response(response)
322             response['value'] = self._unwrap_value(
323                 response.get('value', None))
~Anaconda3libsite-packagesseleniumwebdriverremoteerrorhandler.py in check_response(self, response)
240                 alert_text = value['alert'].get('text')
241             raise exception_class(message, screen, stacktrace, alert_text)
--> 242         raise exception_class(message, screen, stacktrace)
243 
244     def _value_or_default(self, obj, key, default):
StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: chrome=79.0.3945.79)
(Driver info: chromedriver=72.0.3626.7 (efcef9a3ecda02b2132af215116a03852d08b9cb),platform=Windows NT 10.0.18362 x86_64)

completed_matches列表包含 159 个 html 元素,但 for 循环仅显示第一个单击的链接,然后抛出 StaleElementReferenceException...

有谁知道如何解决这个问题?

您要查找的网址位于您正在单击的链接中。您选择要单击的父元素。StaleElementReferenceException是因为在您单击链接后,页面会更改,呈现第一个被单击过时的元素之后的所有元素。

from selenium import webdriver
import selenium
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Chrome(r"C:UsersMartDownloadschromedriver_win32_2chromedriver.exe")
url = "https://footystats.org/spain/la-liga/matches"
driver.get(url)
completed_matches = driver.find_elements_by_xpath("""//*[@id="matches-list"]/div[@class='full-matches-table mt2e ' or @class='full-matches-table mt1e ']/div/div[2]/table[@class='matches-table inactive-matches']/tbody/tr[*]/td[3]/a[1]/span""");
print(len(completed_matches))
for match in completed_matches:
#match.click()
#print("Current driver URL: " + driver.current_url)
match_parent = match.find_element_by_xpath("..")
href = match_parent.get_attribute("href")
print("href: ", href)

单击后,DOM 将刷新,因此会出现 StaleElementReferenceException。因此,在 for 循环中再次构建completed_matches元素。

completed_matches = driver.find_elements_by_xpath("""//*[@id="matches-list"]/div[@class='full-matches-table mt2e ' or @class='full-matches-table mt1e ']/div/div[2]/table[@class='matches-table inactive-matches']/tbody/tr[*]/td[3]/a[1]/span""");
print(len(completed_matches))
for match in completed_matches:
completed_matches = driver.find_elements_by_xpath("""//*[@id="matches-list"]/div[@class='full-matches-table mt2e ' or @class='full-matches-table mt1e ']/div/div[2]/table[@class='matches-table inactive-matches']/tbody/tr[*]/td[3]/a[1]/span""");
match.click()

陈旧意味着陈旧、腐烂、不再新鲜。过时元素表示旧元素或不再可用的元素。假设在网页上找到一个元素,该元素在 WebDriver 中作为 WebElement 引用。如果 DOM 发生变化,那么 WebElement 就会过时。

因此,这意味着单击该元素后您正在处理的页面正在更改,因此这是我修复它的建议:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
while True:
try:
completed_match = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH, "//*[@id="matches-list"]/div[@class='full-matches-table mt2e ")))
except TimeoutException:
break
completed_match.click()
time.sleep(2)

因此,只需遍历元素并每次更新它,在这种情况下,它肯定会在页面的 DOM 中

您可以在此处查看带有完整详细信息代码的Tripadvisor的网络爬虫:

https://github.com/alirezaznz/Tripadvisor-Webscraper

相关内容

  • 没有找到相关文章

最新更新