Selenium等待javascript超时



我想做的是抓取以下站点https://wiki.openstreetmap.org/wiki/Key:office,特别是包含所有标签的表,因此包含在:

<table class="wikitable taginfo-taglist">...<table>

<div class="taglist" ...> ... <div>

(表的父)是由JavaScript生成的,我认为这段代码可以工作:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

options = Options()
options.add_argument("--headless")
caps = webdriver.DesiredCapabilities().FIREFOX
caps["marionette"] = True
driver = webdriver.Firefox(options=options, capabilities=caps, executable_path='../statics/geckodriver')


def get_tag_soup(url):
driver.get(url)
try:
table = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME , "wikitable taginfo-taglist")))
soup = BeautifulSoup(table.get_attribute('innerHTML'), 'lxml') 
except Exception as e:
soup = e

return soup 
get_tag_soup('https://wiki.openstreetmap.org/wiki/Key:office')

但是当我运行这段代码时,我只是得到一个selenium.common.exceptions.TimeoutException('', None, None),更令人沮丧的是,如果我用EC.presence_of_element_located((By.CLASS_NAME , "taglist"))"wikitable taginfo-taglist"的父WebDriverWait,它可以工作。

要提取包含所有标签的表而不是presence_of_element_located(),您必须诱导WebDriverWait等待visbility_of_element_located(),您可以使用以下定位器策略:

  • UsingCSS_SELECTOR:

    driver.get("https://wiki.openstreetmap.org/wiki/Key:office")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.wikitable.taginfo-taglist"))).text)
    
  • UsingXPATH:

    driver.get("https://wiki.openstreetmap.org/wiki/Key:office")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='wikitable taginfo-taglist']"))).text)
    
  • 控制台输出
  • :

    Key Value Element Description Map rendering Image Count
    office accountant An office for an accountant.
    6 895
    1 967
    14
    office advertising_agency A service-based business dedicated to creating, planning, and handling advertising.
    3 916
    580
    3
    office architect An office for an architect or group of architects.
    5 715
    1 239
    12
    office association An office of a non-profit organisation, society, e.g. student, sport, consumer, automobile, bike association, etc.
    13 054
    3 286
    50
    office charity An office of a charitable organization
    696
    384
    7
    office company An office of a private company
    129 801
    36 951
    608
    office consulting An office for a consulting firm, providing expert professional advice to other companies or organisations.
    1 341
    162
    4
    office coworking An office where people can go to work (might require a fee); not limited to a single employer
    1 297
    320
    7
    office diplomatic
    6 634
    4 065
    95
    office educational_institution An office for an educational institution.
    14 172
    8 563
    175
    office employment_agency An office for an employment service.
    7 300
    1 771
    43
    office energy_supplier An office for a energy supplier.
    2 237
    1 112
    19
    office engineer An office for an engineer or group of engineers.
    454
    98
    2
    office estate_agent A place where you can rent or buy a house.
    44 813
    8 042
    39
    office financial An office of a company in the financial sector
    4 891
    1 588
    24
    office forestry A forestry office
    523
    741
    9
    office foundation An office of a foundation
    1 757
    542
    10
    office government An office of a (supra)national, regional or local government agency or department
    98 289
    70 569
    2 300
    office guide An office for tour guides, mountain guides, dive guides, etc.
    587
    168
    1
    office insurance An office at which you can take out insurance policies.
    34 693
    6 475
    91
    office it An office for an IT specialist.
    9 486
    2 039
    51
    office lawyer An office for a lawyer.
    22 881
    4 841
    22
    office logistics An office for a forwarder / hauler.
    2 796
    677
    8
    office moving_company An office which offers a relocation service.
    605
    252
    4
    office newspaper An office of a newspaper
    3 511
    1 450
    27
    office ngo An office for a non-profit, non-governmental organisation (NGO).
    12 693
    3 565
    58
    office notary An office for a notary public (common law)
    3 860
    548
    9
    office political_party An office of a political party
    3 354
    1 017
    8
    office property_management Office of a company, which manages a real estate property.
    796
    162
    2
    office quango An office of a quasi-autonomous non-governmental organisation.
    366
    233
    4
    office religion office of a community of faith
    5 807
    2 172
    43
    office research An office for research and development
    3 667
    4 545
    348
    office surveyor An office of a person doing surveys, this can be risk and damage evaluations of properties and equipment, opinion surveys or statistics.
    451
    109
    1
    office tax_advisor An office for a financial expert specially trained in tax law
    5 053
    823
    4
    office telecommunication An office for a telecommunication company
    16 968
    4 335
    77
    office visa An office of an organisation or business which offers visa assistance
    95
    1
    0
    office water_utility The office for a water utility company or water board.
    743
    908
    20
    office yes Generic tag for unspecified office type.
    27 434
    36 155
    420
    

注意:请确保您已最大化浏览器Viewport,如下所示:

options.add_argument("start-maximized")

最新更新