如何提取显示为span标记内链接一部分的文本。
<span class="pull-left w-100 font30 medium_blue_type mb10"><a href='/XLY'>XLY</a></span> <span class="w-100">Largest Allocation</span>
输出:
XLY
我尝试了几种方法,其中包括使用
elems = driver.find_elements_by_class_name("span.pull-left.w-100.font30.medium_blue_type.mb10")
elems = driver.find_element_by_xpath('.//span[@class = "pull-left w-100 font30 medium_blue_type mb10"]')
但无法使其工作。该网站https://www.etf.com/stock/TSLA.
编辑:是否可以在不打开浏览器中的窗口的情况下(例如;"无头";选项
op = webdriver.ChromeOptions()
op.add_argument('headless')
driver = webdriver.Chrome(CHROME_DRIVER_PATH, options=op)
如果您喜欢基于文本的定位器,可以使用以下内容:
//span[text()='Largest Allocation']/../span
- 您应该先点击cookie我理解按钮
- 利用显式等待
所以你的有效代码是:
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("https://www.etf.com/stock/TSLA")
try:
wait.until(EC.element_to_be_clickable((By.LINK_TEXT, "I Understand"))).click()
print("Clicked on I understand button")
except:
pass
txt = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[text()='Largest Allocation']/../span"))).text
print(txt)
进口:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
输出:
Clicked on I understand button
XLY
Process finished with exit code 0
如果您正在寻找不基于文本的定位器,请使用以下代码行:
txt = wait.until(EC.visibility_of_element_located((By.XPATH, "(//span[contains(@class,'medium_blue_type')]//a)[2]"))).text
这里有几个可能的问题:
- 也许你错过了一个延迟
- 您使用的定位器可能不是唯一的
- 我可以看到,您正在从返回的web元素中提取属性值
- web元素可以在iframe等中。
根据当前共享的信息,您可以尝试添加等待并提取web元素值,如下所示:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 20)
href = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[@class = "pull-left w-100 font30 medium_blue_type mb10"]"))).get_attribute("href")
使用以下xpath
来标识href
链接。
//div[./span[text()='Largest Allocation']]//a
你需要引起一些延迟才能得到元素。使用WebDriverWait()
并等待元素的可见性。
获取文本:
print(WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH, "//div[./span[text()='Largest Allocation']]//a"))).text)
获取href:
print(WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH, "//div[./span[text()='Largest Allocation']]//a"))).get_attribute("href"))
您需要导入以下库。
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC