我正在使用selenium进行web抓取,我有以下HTML:
<a data-field="---" class="---" target="---" href="https://www.example.com/0000/">
<div class="display-flex align-items-center">
<span class="mr1 hoverable-link-text t-bold">
如何访问href链接并单击它?我使用了以下内容,但没有得到结果,如果能理解原因,那就太好了:
browser.find_element(By.PARTIAL_LINK_TEXT;https://www.example.com"(
browser.find_element(By.XPATH,"//a[包含(text(,'https://www.example.com’(]"(
谢谢!
编辑:我工作的页面是领英兴趣页面(我关注的公司(。您可以在以下网站上找到它:https://www.linkedin.com/in/yourusername/details/interests/?detailScreenTabIndex=1
对于我关注的每一家公司,都有一个HTML:
<a data-field="active_tab_companies_interests" class="optional-action-target-wrapper
display-flex flex-column full-width" target="_self" href="https://www.linkedin.com/company/1016/">
<div class="display-flex align-items-center">
<span class="mr1 hoverable-link-text t-bold">
<span aria-hidden="true"><!---->GE Healthcare<!----></span><span class="visually-hidden"><!---->GE Healthcare<!----></span>
</span>
<!----><!----><!----> </div>
<!----> <span class="t-14 t-normal t-black--light">
<span aria-hidden="true"><!---->1,851,945 followers<!----></span><span class="visually-hidden"><!---->1,851,945 followers<!----></span>
</span>
<!----> </a>
我想在我的例子中找到href:"https://www.linkedin.com/company/1016/">
我写的代码(在评论的帮助下(:
# log in
browser.get("https://www.linkedin.com")
username = browser.find_element(By.ID,"session_key")
username.send_keys("youremail")
password = browser.find_element(By.ID,"session_password")
password.send_keys("yourpassword")
login_button = browser.find_element(By.CLASS_NAME, "sign-in-form__submit-button")
login_button.click()
# companies I follow on Linkedin
browser.get("https://www.linkedin.com/in/username/details/interests/?detailScreenTabIndex=1")
# find all company links
wait = WebDriverWait(browser, 20)
company_page = browser.find_elements(By.XPATH,"//a[contains(@href,'https://www.linkedin.com/company/')]")
for x in range (len(company_page)):
print(company_page[x].text)
";GE healthcare";(来自HTML片段(是:GE HealthcareGE Healthcare1852718名追随者1852718名追随者
而不是我正在查找的href链接。我不明白为什么它找到的是这些文本而不是链接。谢谢
https://www.example.com/0000/
不是文本属性内容。它是href
属性的值。这就是为什么你们两个定位器都错了
请尝试这个:
browser.find_element(By.XPATH,"//a[contains(@href,'https://www.example.com')]")
添加.click()
可能会点击该元素,如下所示:
browser.find_element(By.XPATH,"//a[contains(@href,'https://www.example.com')]").click()
您可能需要添加一个延迟来等待元素可点击。在这种情况下,WebDriverWait
预期条件是正确的方法,如下所示:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(browser, 20)
wait.until(EC.element_to_be_clickable((By.XPATH, "//a[contains(@href,'https://www.example.com')]"))).click()