如何使用selenium-python从具有多个页面(分页)的特定div容器中获取来自子元素的所有超链接



我正试图从该网站的父id="搜索属性"中抓取子元素href属性内的链接。我首先尝试使用find_elements_by_id定位元素,然后使用find_eElements_by_css_selector定位链接,但我总是得到属性错误:"list"对象在执行此操作时没有属性"find_elements_by_css.selectors">,因此我尝试使用find_elements_by_tag_name和find_element_sby_xpath,但它实际上并没有抓取链接,而是抓取了内部的详细信息对我来说没有用的链接。所以经过一番寻找,我终于找到了这个代码

from logging import exception
from typing import Text
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
import time
import pandas as pd
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import csv
from selenium import webdriver 
PATH = "C:/ProgramData/Anaconda3/scripts/chromedriver.exe" #always keeps chromedriver.exe inside scripts to save hours of debugging
driver =webdriver.Chrome(PATH) #preety important part
driver.get("https://www.gharbazar.com/property/search/?_q=&_vt=1&_r=0&_pt=residential&_si=0&_srt=latest")
driver.implicitly_wait(10)
house=driver.find_elements_by_tag_name("a")
# traverse list
for lnk in house:
# get_attribute() to get all href
print(lnk.get_attribute('href'))

这段代码的问题是,它刮去了所有的链接,这意味着它还有一些绝对不必要的链接,比如图中的不需要javascript void。最后,对于分页,我试图遵循这个答案,但得到了无限循环,所以我不得不删除分页代码。总之,我正在尝试获得具有id="搜索属性"的多个页面的链接

我尝试过这种分页方式。

from selenium import webdriver
import time
driver = webdriver.Chrome(executable_path="path")
driver.implicitly_wait(10)
driver.get("https://www.gharbazar.com/property/search/?_q=&_vt=1&_r=0&_pt=residential&_si=0&_srt=latest")
page=2
while True:
nextoption = driver.find_element_by_xpath("//div[@id='pagination-div']//a[contains(text(),'>>')]")
driver.execute_script("arguments[0].scrollIntoView(true);",nextoption)
driver.execute_script("window.scrollBy(0,-300)")
time.sleep(5)
try:
driver.find_element_by_link_text(str(page)).click()
page += 1
time.sleep(3)
except Exception as e:
print(e)
break
driver.quit()

我尝试过从每个页面获取链接。

driver.get("https://www.gharbazar.com/property/search/?_q=&_vt=1&_r=0&_pt=residential&_si=0&_srt=latest")
page=2
pagelinks= []
#links of the 1st page
links = driver.find_elements_by_xpath("//div[@id = 'search-properties']/a")
for ele in links:
pagelinks.append(ele.get_attribute('href'))
while True:
nextoption = driver.find_element_by_xpath("//div[@id='pagination-div']//a[contains(text(),'>>')]")
driver.execute_script("arguments[0].scrollIntoView(true);",nextoption)
driver.execute_script("window.scrollBy(0,-300)")
time.sleep(5)
try:
driver.find_element_by_link_text(str(page)).click()
page += 1
links = driver.find_elements_by_xpath("//div[@id = 'search-properties']/a")
for ele in links:
pagelinks.append(ele.get_attribute('href'))
time.sleep(3)
except Exception as e:
print(e)
break
print(len(pagelinks))
for i in range(len(pagelinks)):
print(pagelinks[i])
driver.quit()

试试这个。

links = driver.find_elements_by_xpath("//div[@id = 'search-properties']/a")
for ele in links:
print(ele.get_attribute('href'))

最新更新