我不明白为什么当我肯定使用了正确的Xpath
时,我试图从中提取文本的列表却返回空白。这是我的代码:
driver = webdriver.Firefox()
driver.get("https://www.omegawatches.com/watch-omega-specialities-first-omega-wrist-chronograph-51652483004001")
betweenLugs = driver.find_elements(By.XPATH, "/html/body/div[2]/main/div[3]/div/div/div[2]/div/div[2]/div[3]/div/ul/li[1]")])
print(betweenLugs.text)
这应该获取第一个列表项和测量
Between lugs: 20 mm
我也尝试过其他方法,但事实上Xpath
没有接收到它告诉我出了问题,不管我怎么做,我都无法提取列表中的文本。有人知道我做错了什么吗?这是我第一次遇到这个问题。
xpath
错误。它在/div[2]
中失败,与任何内容都不匹配。这是一个为什么不应该使用绝对路径的例子。
该部分具有id
属性,使用它
betweenLugs = driver.find_elements(By.XPATH, "//*[@id='product-info-data-5bea7fa7406d7']/ul/li[1]")[0]
您可能还想添加一些等待加载的时间
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
betweenLugs = WebDriverWait(driver, 10).until(expected_conditions.visibility_of_element_located((By.XPATH, "//*[@id='product-info-data-5bea7fa7406d7']/ul/li[1]")))
好吧,试试这个,看看它是否能解决问题:
between_lugs = driver.find_element_by_xpath("//*[contains(text(), 'Between lugs')]").get_attribute("innerHTML")
between_lugs_value = driver.find_element_by_xpath("//*[contains(text(), 'Between lugs')]/../span").get_attribute("innerHTML")
final_text = between_lugs + " " + between_lugs_value
该页面上已经有jQuery,因此您可以:
driver.execute_script("return jQuery('li:contains(Between lugs)').text().trim().replace(/s+/g, ' ')")
你可以在chrome选择器中摆弄选择器,这会让它变得更容易。
另一种更简单的方法可能是以下方法:
from contextlib import closing
from selenium import webdriver
from selenium.webdriver.support import ui
url = "https://www.omegawatches.com/watch-omega-specialities-first-omega-wrist-chronograph-51652483004001"
with closing(webdriver.Chrome()) as wd:
wait = ui.WebDriverWait(wd, 10)
wd.get(url)
item = wait.until(lambda wd: wd.find_element_by_xpath("//*[contains(@class,'technical-data')]//li")).get_attribute('textContent')
print(' '.join(item.split()))
输出:
Between lugs: 20 mm
使用向下滚动和带有css选择器的等待来针对父li
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
driver = webdriver.Chrome() #Firefox()
driver.get("https://www.omegawatches.com/watch-omega-specialities-first-omega-wrist-chronograph-51652483004001")
driver.execute_script("window.scrollTo(0, 2000)")
betweenLugs = WebDriverWait(driver, 10).until(expected_conditions.visibility_of_element_located((By.CSS_SELECTOR, "#product-info-data-5beaf5497d916 > ul > li:nth-child(1)")))
print(betweenLugs.text)