Selenium加载,但不打印所有HTML



我正试图使用Python和Selenium从网站上动态抓取加载的数据。问题是,只有大约一半的数据被报告为存在,而实际上所有数据都应该存在。即使在打印出所有页面内容之前使用暂停,或者简单地逐类查找元素搜索,似乎也没有解决方案。网站的URL为https://www.sportsbookreview.com/betting-odds/nfl-football/consensus/?date=20180909.正如你所看到的,共有13个主要部分,但我只能从前四场比赛中检索数据。为了最好地显示这个问题,我将附加用于打印整个页面的内部HTML的代码,以显示加载数据和未加载数据之间的差异。

from selenium import webdriver
import requests
url = "https://www.sportsbookreview.com/betting-odds/nfl-football/consensus/?date=20180909"
driver = webdriver.Chrome()
driver.get(url)
print(driver.execute_script("return document.documentElement.innerText;"))

编辑:问题不在于等待时间,因为我正在一行一行地运行它,并完全等待它加载。问题似乎归结为selenium没有获取页面上所有JS加载的文本,如下面答案中的控制台输出所示。

@sudonym的分析方向是正确的。您需要诱导WebDriverWait使所需的元素可见,然后尝试通过execute_script()方法提取它们,如下所示:

  • 代码块:

    # -*- coding: UTF-8 -*-
    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.by import By
    url = "https://www.sportsbookreview.com/betting-odds/nfl-football/consensus/?date=20180909"
    driver = webdriver.Chrome()
    driver.get(url)
    WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//h2[contains(.,'USA - National Football League')]//following::section//span[3]")))
    print(driver.execute_script("return document.documentElement.innerText;"))
    
  • 控制台输出:

    SPORTSBOOK REVIEW
    Home
    Best Sportsbooks
    Rating Guide
    Blacklist
    Bonuses
    BETTING ODDS
    FREE PICKS
    Sports Picks
    NFL
    College Football
    NBA
    NCAAB
    MLB
    NHL
    More Sports
    How to Bet
    Tools
    FORUM
    Home
    Players Talk
    Sportsbooks & Industry
    Newbie Forum
    Handicapper Think Tank
    David Malinsky's Point Blank
    Service Plays
    Bitcoin Sports Betting
    NBA Betting
    NFL Betting
    NCAAF Betting
    MLB Betting
    NHL Betting
    CONTESTS
    EARN BETPOINTS
    What Are Betpoints?
    SBR Sportsbook
    SBR Casino
    SBR Racebook
    SBR Poker
    SBR Store
    Today
    NFL
    NBA
    NHL
    MLB
    College Football
    NCAA Basketball
    Soccer
    Soccer Odds
    Major League Soccer
    UEFA Champions League
    UEFA Nations League
    UEFA Europa League
    English Premier League
    World Cup 2022
    Tennis
    Tennis Odds
    ATP
    WTA
    UFC
    Boxing
    More Sports
    CFL
    WNBA
    AFL
    Betting Odds/NFL Odds/Consensus
    TODAY
    |
    YESTERDAY
    |
    DATE
    ?
    Login
    ?
    Settings
    ?
    Bet Tracker
    ?
    Bet Card
    ?
    Favorites
    NFL Consensus for Sep 09, 2018
    USA - National Football League
    Sunday Sep 09, 2018
    01:00 PM
    /
    Pittsburgh vs Cleveland
    453
    Pittsburgh
    454
    Cleveland
    Current Line
    -3½+105
    +3½-115
    Wagers Placed
    10040
    54.07%
    8530
    45.93%
    Amount Wagered
    $381,520.00
    56.10%
    $298,550.00
    43.90%
    Average Bet Size
    $38.00
    $35.00
    SBR Contest Best Bets
    22
    9
    01:00 PM
    /
    San Francisco vs Minnesota
    455
    San Francisco
    456
    Minnesota
    Current Line
    +6-102
    -6-108
    Wagers Placed
    6250
    41.25%
    8900
    58.75%
    Amount Wagered
    $175,000.00
    29.50%
    $418,300.00
    70.50%
    Average Bet Size
    $28.00
    $47.00
    SBR Contest Best Bets
    5
    19
    01:00 PM
    /
    Cincinnati vs Indianapolis
    457
    Cincinnati
    458
    Indianapolis
    Current Line
    -1-104
    +1-106
    Wagers Placed
    11640
    66.36%
    5900
    33.64%
    Amount Wagered
    $1,338,600.00
    85.65%
    $224,200.00
    14.35%
    Average Bet Size
    $115.00
    $38.00
    SBR Contest Best Bets
    23
    12
    01:00 PM
    /
    Buffalo vs Baltimore
    459
    Buffalo
    460
    Baltimore
    Current Line
    +7½-103
    -7½-107
    Wagers Placed
    5220
    33.83%
    10210
    66.17%
    Amount Wagered
    $78,300.00
    16.79%
    $387,980.00
    83.21%
    Average Bet Size
    $15.00
    $38.00
    SBR Contest Best Bets
    5
    17
    01:00 PM
    /
    Jacksonville vs N.Y. Giants
    461
    Jacksonville
    462
    N.Y. Giants
    01:00 PM
    /
    Tampa Bay vs New Orleans
    463
    Tampa Bay
    464
    New Orleans
    01:00 PM
    /
    Houston vs New England
    465
    Houston
    466
    New England
    01:00 PM
    /
    Tennessee vs Miami
    467
    Tennessee
    468
    Miami
    04:05 PM
    /
    Kansas City vs L.A. Chargers
    469
    Kansas City
    470
    L.A. Chargers
    04:25 PM
    /
    Seattle vs Denver
    471
    Seattle
    472
    Denver
    04:25 PM
    /
    Dallas vs Carolina
    473
    Dallas
    474
    Carolina
    04:25 PM
    /
    Washington vs Arizona
    475
    Washington
    476
    Arizona
    08:20 PM
    /
    Chicago vs Green Bay
    477
    Chicago
    478
    Green Bay
    Media
    Site Map
    Terms of use
    Contact Us
    Privacy Policy
    DMCA
    18+. Gamble Responsibly.
    © Sportsbook Review. All Rights Reserved.
    

如果有很多WebDriverWait调用,则该解决方案仅值得考虑考虑到对缩短运行时间的兴趣-其他选择DebanjanB接近

您需要等待一段时间才能完全加载html。此外,您还可以设置脚本执行的超时。为硒中的driver.get(URL(、driver.set_page_load_timeout(n)n = time/seconds和循环添加无条件等待:

driver.set_page_load_timeout(n)         # Set timeout of n seconds for page load
loading_finished = 0                    # Set flag to 0
while loading_finished == 0:            # Repeat while flag = 0
try:
sleep(random.uniform(0.1, 0.5))  # wait some time
website = driver.get(URL)        # try to load for n seconds
loading_finished = 1             # Set flag to 1 and exit while loop
logger.info("website loaded")    # Indicate load success
except:
logger.warn("timeout - retry")   # Indicate load fail
else:                                   # If flag == 1
driver.set_script_timeout(n)        # Set timeout of n seconds for script  
script_finished = 0                 # Set flag to 0
while script_finished == 0          # Second loop
try:
print driver.execute_script("return document.documentElement.innerText;")       
script_finished = 1           # Set flag to 1
logger.info("script done")    # Indicate script done
except:                          
logger.warn("script timeout") 
else:
logger.info("if you're still missing html here, increase timeout")

最新更新