Python Selenium 从主<div>中选择所有 href


中选择所有的href。

我目前正试图从以下网页结构中获得href:

<div style="something> # THIS IS THE MAIN DIV I CAN GET
<div class="aegieogji"> # First ROW sub-div under the main div
<div class="aegegaegeg"> # SUB-SUB-DIV
<a class=egaiegeigaegeigaegge", href="link_I_need">Text</a> # First HREF
<div class="eagegeg"> # SUB-SUB-DIV
<a class=egaegegaegaeg", href="link_I_need">Text</a> # Second HREF
<div class="agaeheahrhrahrhr"> # SUB-SUB-DIV
<a class=arhrharhrahrah", href="link_I_need">Text</a> # Third HREF
<div class="argagragragaw"> # Second ROW subdiv under the main div
<div class="aarhrahrah"> # SUB=SUB-DIV
<a class=arhahrhahr", href="link_I_need">Text</a> # First HREF
<div class="ahrrahrae"> # SUB-SUB-DIV
<a class=eagregargreg", href="link_I_need">Text</a> # Second HREF
<div class="ergrgegaegr"> # SUB-SUB-DIV
<a class=aegaegregrege", href="link_I_need">Text</a> # Third HREF
...
...
</div>

使用Python Selenium和ChromeDriver我可以读取主div"something":

main_elem = browser.find_element(By.XPATH, "/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[1]")

现在,从这里我正在努力使用正确的硒来获得href下的所有链接。

你知道我怎样才能很容易地得到这些吗?谢谢你

PS:我可以看到第一个子-子-div具有以下xpath:

/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[1]/div[1]

然后第二个:

/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[1]/div[2]

等,而第二行子-子-divxpath为:

/html/body/div[2]/div/div/div/div[1]/div/div/div/div[1]/div[1]/div[2]/section/main/article/div[2]/div/div[2]/div[1]

所以有div[2]而不是div[1]等等

一旦您有了主(父)元素,您就可以获得包含href属性的所有子元素并获得它们的值,如下所示:

children = main_elem.find_elements(By.XPATH, ".//a[href]")
for child in children:
href = child.get_attribute("href")
print(href)

要提取所有href属性的值,您必须诱导WebDriverWaitvisbility_of_all_elements_located (),您可以使用以下定位器策略之一:

使用
  • CSS_SELECTOR:

    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[style='something'] div div>a")))])
    
  • 使用<<li>em> XPATH :

    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@style='something']//div//div/a")))])
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

非常感谢您的帮助。我合并了两个注释,并为我的情况找到解决方案:

# read the main DIV with XPATH 
...
# read all the sub-divs
link_elems = element.find_elements(By.XPATH, './/div//div//div/a')
# retrieve the href
for link_elem in link_elems:
sub_div = link_elem.find_elements(By.XPATH, '//a[starts-with(@href, "/p/")]')
for sub in sub_div:
post_href = sub.get_attribute("href")

最新更新