Selenium python代码在交互式地图上点击按钮网页抓取不起作用



我正在尝试使用python中的Selenium从交互式地图中抓取数据。我一直有代码的困难,以单击某些按钮来获得数据。第一次"点击"工作正常,但第二个不工作。我试过调整时间,但是没有效果。我想要能够做第二次点击。任何帮助都会很感激,谢谢。下面是我的代码:

!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
from selenium import webdriver
from selenium.webdriver.common.keys import Keys  
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import Select
import os
from bs4 import BeautifulSoup
import re
import pandas as pd
import sys
import re
import csv
import time
import shutil
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options) 
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium import webdriver
import time
wd.maximize_window()
wd.get("https://hazards.geoplatform.gov/portal/apps/MapSeries/index.html?appid=ddf915a24fb24dc8863eed96bc3345f8")
wd.execute_script("arguments[0].click();", WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="nav-bar"]/div/div[1]/ul/li[4]/a'))))
wd.execute_script("arguments[0].click();", WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="widgets_About_Widget_34"]/div/div/div/div[10]/font/a'))))

错误发生在最后一行,下面是完整的错误:

TimeoutException Traceback(最近一次调用)在()7 wd.get("https://hazards.geoplatform.gov/portal/apps/MapSeries/index.html?appid=ddf915a24fb24dc8863eed96bc3345f8"8 . wd.execute_script("arguments[0].click();", WebDriverWait(wd, 20).until(EC.element_to_be_clickable())XPATH,//[@ id ="nav-bar"/divdiv1/ul/李[4]/"))))——比;9 web .execute_script("arguments[0].click();", WebDriverWait(wd, 60).until(EC.element_to_be_clickable())XPATH,//[@ id ="widgets_About_Widget_34"/div/div/div/div[10]/字体/'))))1011

/usr/local/lib/python3.7/dist-packages/selenium/webdriver/support/wait.py in until(self, method, message)78 if time.time()>end_time:79年 打破——比;80抛出TimeoutException(message, screen, stacktrace)8182 def until_not(self, method, message= "):

TimeoutException:信息:

这是我想要点击的图像:"人口普查区表"点击查看图片

你的代码有几个问题。

  1. 阻止你访问第二个元素的主要原因是它在iframe中,所以要访问它,你首先必须切换到该iframe。
  2. 除非你别无选择,否则不要点击JavaScript元素。
  3. 在无头模式下,你必须定义屏幕大小。
  4. wd.maximize_window()用于正常模式,而不是无头模式。
    你的代码应该是这样的:
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
from selenium import webdriver
from selenium.webdriver.common.keys import Keys  
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import Select
import os
from bs4 import BeautifulSoup
import re
import pandas as pd
import sys
import re
import csv
import time
import shutil
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--window-size=1920,1080')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options) 
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium import webdriver
import time
#wd.maximize_window()
wd.get("https://hazards.geoplatform.gov/portal/apps/MapSeries/index.html?appid=ddf915a24fb24dc8863eed96bc3345f8")
wait.until(EC.visibility_of_element_located((By.XPATH, "//li[contains(@class,'entry  visible')]//a[contains(text(),'Data Download Tool')]"))).click()
wait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it(driver.find_element_by_css_selector("div[class='mainMediaContainer active']>iframe")))
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div[title='Download']"))).click()

Download按钮位于第4个iframe。下面的代码确实点击了Download图标。

driver.implicitly_wait(50)
driver.get("https://hazards.geoplatform.gov/portal/apps/MapSeries/index.html? appid=ddf915a24fb24dc8863eed96bc3345f8")
driver.find_element_by_xpath("//div[@id='nav-bar']/div/div/ul/li[4]/a").click()
driver.switch_to.frame(3)
driver.find_element_by_xpath("//div[@settingid='widgets_Query_Widget_32']").click()
driver.find_element_by_xpath("//div[text()='Find census tracts']").click()