使用Python和Selenium从iframe中抓取所有标记



我正试图从嵌入在这个网站的地图中抓取公司名称和链接:https://www.elitedynamics.co.uk/customers

我开发的代码现在进入页面,向下滚动直到找到第一个按钮(每个标记都是一个按钮)。然后点击按钮,显示并选择信息,按钮关闭,驾驶员进入下一个结果。这是非常混乱的,因为司机不能遵循一个顺序和重复的元素。有更好的方法吗?
driver_path='chromedriver'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(executable_path=driver_path,options = chrome_options)
driver.get("https://www.elitedynamics.co.uk/customers")
property_bubble = driver.find_element_by_xpath('//div[@role="button"]')
actions = ActionChains(driver)
actions.move_to_element(property_bubble).click(property_bubble).perform()
all_properties = driver.find_elements_by_xpath('//div[@role="button"]')
names_list =[]
links_list=[]
for property in all_properties:
actions.move_to_element(property).click(property).perform()
wait = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME,'wpgmza_infowindow_description')))
property_name = driver.find_element_by_xpath('//div[@class="wpgmza_infowindow_description"]/h4')
names_list.append(property_name.text)
print(property_name)
try:
property_link = driver.find_element_by_xpath('//div[@class="wpgmza_infowindow_description"]/h4/a')
links_list.append(property_link.get_attribute('href'))
print(property_link)
except:
try:
property_link = driver.find_element_by_xpath('//div[@class="wpgmza_infowindow_description"]/h4/p/a')
links_list.append(property_link.get_attribute('href'))  
print(property_link)
except:
pass    
time.sleep(2)
driver.find_element_by_xpath('//button[@title="Close"]').click()
print(names_list)
print(links_list)

实际上,要抓取站点,您不需要使用selenium,因为所需的数据几乎都是从外部源以json形式生成的。

下面是工作解决方案:

import requests
import pandas as pd
params = {
'filter': '{"map_id":"4","mashupIDs":[],"customFields":[]}'

}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36'
}

def main(url):
with requests.Session() as req:
req.headers.update(headers)
elit = []
r = req.get(url, params=params)
for item in r.json()['markers']:
elit.append([item['title'], item['icon']['url']])
df = pd.DataFrame(elit, columns=["Title", "Url"])
print(df)

main(
'https://www.elitedynamics.co.uk/wp-json/wpgmza/v1/features/')

输出:

Title                                                Url
0              Landal Darwin Forest  //www.elitedynamics.co.uk/wp-content/uploads/2...  
1                 Landal Sandybrook  //www.elitedynamics.co.uk/wp-content/uploads/2...  
2             Pinewood Holiday Park  //www.elitedynamics.co.uk/wp-content/uploads/2...  
3           Peppermint Holiday Park  //www.elitedynamics.co.uk/wp-content/uploads/2...  
4          Riviera Bay Holiday Park  //www.elitedynamics.co.uk/wp-content/uploads/2...  
..                              ...                                                ...  
250        Hedley Wood Holiday Park  //www.elitedynamics.co.uk/wp-content/uploads/2...  
251  Ashbourne Heights Holiday Park  //www.elitedynamics.co.uk/wp-content/uploads/2...  
252    Sand le Mere Holiday Village  //www.elitedynamics.co.uk/wp-content/uploads/2...  
253                    Bowland Fell  //www.elitedynamics.co.uk/wp-content/uploads/2...  
254       Silver Sands Holiday Park  //www.elitedynamics.co.uk/wp-content/uploads/2...  
[255 rows x 2 columns]

最新更新