我正试图通过Python使用selenium访问元素的文本。我可以很好地访问元素本身,但当我试图获取文本时,它不起作用。
这是我的代码:
from selenium import webdriver
driver = webdriver.Chrome() # I removed the path for my post, but there is one that works in my actual code
URL = "https://www.costco.com/laptops.html"
driver.get(URL)
prices = driver.find_elements_by_class_name("price")
print([price.text for price in prices])
如果我运行此代码,我会得到:selenium.com.mon.exceptions.StaleElementReferenceException:消息:过时的元素引用:元素未附加到页面文档
然而,如果我打印出元素本身,我没有问题。我读过一些以前关于陈腐元素异常的文章,但我不明白为什么在这种情况下它适用于我。当我尝试访问文本时,为什么DOM会发生变化?为什么会发生这种情况?
原来你只需要等待:
from selenium import webdriver
import time
driver = webdriver.Chrome() # I removed the path for my post, but there is one that works in my actual code
URL = "https://www.costco.com/laptops.html"
driver.get(URL)
time.sleep(3)
prices = driver.find_elements_by_class_name("price")
print([price.text for price in prices])
输出:
['$1,999.99', '$2,299.99', '', '', '$769.99', '', '$799.99', '$1,449.99', '$1,199.99', '$1,199.99', '$1,999.99', '$1,599.99', '$1,299.99', '$2,299.99', '$1,549.99', '$1,499.99', '$599.99', '$1,699.99', '$1,079.99', '$2,999.99', '$1,649.99', '$1,499.99', '$2,399.99', '$1,499.97', '$1,199.99', '$1,649.99', '$849.99', '']
正确的方法是使用WebDriverWait
。参见
旧答案:
我不完全确定为什么会发生这种情况。但我建议你试试BeautifulSoup
:
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome() # I removed the path for my post, but there is one that works in my actual code
URL = "https://www.costco.com/laptops.html"
driver.get(URL)
soup = BeautifulSoup(driver.page_source)
divs = soup.find_all("div",{"class":"price"})
[div.text.replace("t",'').replace("n",'') for div in divs]
输出:
['$1,099.99',
'$399.99',
'$1,199.99',
'$599.99',
'$1,049.99',
'$799.99',
'$699.99',
'$949.99',
'$699.99',
'$1,999.99',
'$449.99',
'$2,699.99',
'$1,149.99',
'$1,599.99',
'$1,049.99',
'$1,249.99',
'$299.99',
'$1,799.99',
'$749.99',
'$849.99',
'$2,299.99',
'$999.99',
'$649.99',
'$799.99']