使用Selenium获取span标记内href链接内的文本



如何提取显示为span标记内链接一部分的文本。

<span class="pull-left w-100 font30 medium_blue_type mb10"><a href='/XLY'>XLY</a></span> <span class="w-100">Largest Allocation</span>

输出:

XLY

我尝试了几种方法,其中包括使用

elems = driver.find_elements_by_class_name("span.pull-left.w-100.font30.medium_blue_type.mb10")
elems = driver.find_element_by_xpath('.//span[@class = "pull-left w-100 font30 medium_blue_type mb10"]')

但无法使其工作。该网站https://www.etf.com/stock/TSLA.

编辑:是否可以在不打开浏览器中的窗口的情况下(例如;"无头";选项

op = webdriver.ChromeOptions()
op.add_argument('headless')
driver = webdriver.Chrome(CHROME_DRIVER_PATH, options=op)

如果您喜欢基于文本的定位器,可以使用以下内容:

//span[text()='Largest Allocation']/../span
  1. 您应该先点击cookie我理解按钮
  2. 利用显式等待

所以你的有效代码是:

driver = webdriver.Chrome(driver_path)
driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("https://www.etf.com/stock/TSLA")
try:
wait.until(EC.element_to_be_clickable((By.LINK_TEXT, "I Understand"))).click()
print("Clicked on I understand button")
except:
pass
txt = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[text()='Largest Allocation']/../span"))).text
print(txt)

进口:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

输出:

Clicked on I understand button
XLY
Process finished with exit code 0

如果您正在寻找不基于文本的定位器,请使用以下代码行:

txt = wait.until(EC.visibility_of_element_located((By.XPATH, "(//span[contains(@class,'medium_blue_type')]//a)[2]"))).text

这里有几个可能的问题:

  1. 也许你错过了一个延迟
  2. 您使用的定位器可能不是唯一的
  3. 我可以看到,您正在从返回的web元素中提取属性值
  4. web元素可以在iframe等中。
    根据当前共享的信息,您可以尝试添加等待并提取web元素值,如下所示:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 20)
href = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[@class = "pull-left w-100 font30 medium_blue_type mb10"]"))).get_attribute("href")

使用以下xpath来标识href链接。

//div[./span[text()='Largest Allocation']]//a

你需要引起一些延迟才能得到元素。使用WebDriverWait()并等待元素的可见性。

获取文本:

print(WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH, "//div[./span[text()='Largest Allocation']]//a"))).text)

获取href:

print(WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH, "//div[./span[text()='Largest Allocation']]//a"))).get_attribute("href"))

您需要导入以下库。

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

最新更新