用硒清除动态含量

  • 本文关键字:动态 清除 python
  • 更新时间 :
  • 英文 :


我正在努力学习如何从网络上抓取内容,并且在之前的一次尝试中成功地发现了我认为是动态内容的内容,但结果发现隐藏在源代码中显示的标签下的内容。多亏了这里的社区,我能够很容易地获得Beautiful Soup和Panda的数据。

对于我的下一个挑战,我将尝试从一个实际上是动态生成的网站中获取数据,而这些数据似乎不在页面源中。我的代码在下面,虽然我可以提取包含动态内容的容器,但它是空的。当我使用开发人员工具进行查看时,我可以看到带有class="event2-2-1行"的div,其中包含一些数据。但每次我试图找到这些标签时,都找不到它们。

有人能帮我指出这里的正确路径吗?我搜索了这个论坛,还没有找到我的答案。

from selenium import webdriver
import re
from bs4 import BeautifulSoup

start_url = "https://www.tissottiming.com/Live/Index?id=0003100005010105FFFFFFFFFFFFFFF2&style=Tissot"#input("Enter the results URL: ")
driver = webdriver.Chrome()
driver.implicitly_wait(10)
driver.get(start_url)
content = driver.find_element_by_xpath('//*[@id="container-fluid"]')
print(content)

这是我从打印报表中得到的。

<selenium.webdriver.remote.webelement.WebElement (session="99ca6419fd181c0bdd39797e20c739df", element="0.7688034456332402-1")>

我使用以下代码解析了动态内容:

from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC # available since 2.26.0
start_url = "https://www.tissottiming.com/Live/Index?id=0003100005010105FFFFFFFFFFFFFFF2&style=Tissot"#input("Enter the results URL: ")
driver = webdriver.Chrome()
driver.get(start_url)
WebDriverWait(driver, 15).until(EC.presence_of_element_located((By.XPATH, "//div[@class='heat 2_2_1_1_1 row']")))
x = driver.find_element_by_xpath("//div[@class='heat 2_2_1_1_1 row']")
print(x.get_attribute('innerHTML'))

<div class="name row"><span>HEAT 1</span></div><div class="heatsheaders row rowtitle"><div class="col-xs-05 rank">Rank</div><div class="col-xs-05 bib">Bib</div><div class="col-xs-3 longname">Name</div><div class="col-xs-1 nation">Nat</div><div class="col-xs-5 run_title"><div class="RunName col-xs-4">1ST RACE</div><div class="RunName col-xs-4">2ND RACE</div><div class="RunName col-xs-4">DECIDER</div></div><div class="col-xs-1 qualified"></div><div class="col-xs-1 points">Time</div></div><div class="rider 2_2_1_1_1_1_1 row" data-sortorder="1" data-inter-pos-x="2" data-inter-pos-y="342" data-final-pos-x="2" data-final-pos-y="342" style="transition: all 600ms ease 0ms, opacity 600ms linear; display: block; transform: translate(0px, 0px);" data-bound="true"><div class="rank col-xs-05"><span>1</span></div><div class="bib col-xs-05"><span>52</span></div><div class="longname col-xs-3"><span>GLAETZER Matthew</span><div class="teamname "><span>AUSTRALIA</span></div></div><div class="nation col-xs-1"><span><div class="img_flag">AUS<img src="/Content/images/flags/AUS.png" alt="AUS national flag"></div></span></div><div class="run_group col-xs-5"><div class="run 2_2_1_1_1_1_1_1_1 col-xs-4"><div class="time row"><span>10.218</span></div><div class="points row"><span>70,464</span></div></div><div class="run 2_2_1_1_1_1_1_1_2 col-xs-4"><div class="time row"><span>0.000</span></div><div class="points row"><span>0,000</span></div></div><div class="run 2_2_1_1_1_1_1_1_3 col-xs-4"><div class="time row"><span></span></div><div class="points row"><span></span></div></div></div><div class="qualified col-xs-1"><span>QG</span></div><div class="points col-xs-1"><span></span></div></div><div class="rider 2_2_1_1_1_1_2 row" data-sortorder="2" data-inter-pos-x="2" data-inter-pos-y="422" data-final-pos-x="2" data-final-pos-y="422" style="transition: all 600ms ease 0ms, opacity 600ms linear; display: block; transform: translate(0px, 0px);" data-bound="true"><div class="rank col-xs-05"><span>2</span></div><div class="bib col-xs-05"><span>53</span></div><div class="longname col-xs-3"><span>HART Nathan</span><div class="teamname "><span>AUSTRALIA</span></div></div><div class="nation col-xs-1"><span><div class="img_flag">AUS<img src="/Content/images/flags/AUS.png" alt="AUS national flag"></div></span></div><div class="run_group col-xs-5"><div class="run 2_2_1_1_1_1_2_1_1 col-xs-4"><div class="time row"><span>+0.028</span></div><div class="points row"><span></span></div></div><div class="run 2_2_1_1_1_1_2_1_2 col-xs-4"><div class="time row"><span>+0.000</span></div><div class="points row"><span></span></div></div><div class="run 2_2_1_1_1_1_2_1_3 col-xs-4"><div class="time row"><span></span></div><div class="points row"><span></span></div></div></div><div class="qualified col-xs-1"><span>QB</span></div><div class="points col-xs-1"><span></span></div></div>

最新更新