使用Selenium获取span标记内href链接内的文本

如何提取显示为span标记内链接一部分的文本。

<span class="pull-left w-100 font30 medium_blue_type mb10"><a href='/XLY'>XLY</a></span> <span class="w-100">Largest Allocation</span>

输出：

XLY

我尝试了几种方法，其中包括使用

elems = driver.find_elements_by_class_name("span.pull-left.w-100.font30.medium_blue_type.mb10")
elems = driver.find_element_by_xpath('.//span[@class = "pull-left w-100 font30 medium_blue_type mb10"]')

但无法使其工作。该网站https://www.etf.com/stock/TSLA.

编辑：是否可以在不打开浏览器中的窗口的情况下(例如；"无头"；选项

op = webdriver.ChromeOptions()
op.add_argument('headless')
driver = webdriver.Chrome(CHROME_DRIVER_PATH, options=op)

如果您喜欢基于文本的定位器，可以使用以下内容：

//span[text()='Largest Allocation']/../span

您应该先点击cookie我理解按钮
利用显式等待

所以你的有效代码是：

driver = webdriver.Chrome(driver_path)
driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("https://www.etf.com/stock/TSLA")
try:
wait.until(EC.element_to_be_clickable((By.LINK_TEXT, "I Understand"))).click()
print("Clicked on I understand button")
except:
pass
txt = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[text()='Largest Allocation']/../span"))).text
print(txt)

进口：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

输出：

Clicked on I understand button
XLY
Process finished with exit code 0

如果您正在寻找不基于文本的定位器，请使用以下代码行：

txt = wait.until(EC.visibility_of_element_located((By.XPATH, "(//span[contains(@class,'medium_blue_type')]//a)[2]"))).text

这里有几个可能的问题：

也许你错过了一个延迟
您使用的定位器可能不是唯一的
我可以看到，您正在从返回的web元素中提取属性值
web元素可以在iframe等中。
根据当前共享的信息，您可以尝试添加等待并提取web元素值，如下所示：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 20)
href = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[@class = "pull-left w-100 font30 medium_blue_type mb10"]"))).get_attribute("href")

使用以下xpath来标识href链接。

//div[./span[text()='Largest Allocation']]//a

你需要引起一些延迟才能得到元素。使用WebDriverWait()并等待元素的可见性。

获取文本：

print(WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH, "//div[./span[text()='Largest Allocation']]//a"))).text)

获取href:

print(WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH, "//div[./span[text()='Largest Allocation']]//a"))).get_attribute("href"))

您需要导入以下库。

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

相关内容

最新更新

热门标签：