如何抓取Python中使用Selenium动态生成的多个div



当每大约1秒添加一个新的div时,如何使用Python从Selenium中的div中提取文本?

基于以上答案,我有以下代码:

from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium import webdriver
chrome_path = r"C:scrapechromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://website.com/")
# Get current divs
messages = driver.find_elements_by_class_name('div_i_am_targeting')
# Print all messages
for message in messages:
print(message.text)
while True:
try:
# Wait up to minute for new message to appear
wait(driver, 60).until(lambda driver: driver.find_elements_by_class_name('div_i_am_targeting') != messages)
# Print new message
for message in [m.text for m in driver.find_elements_by_class_name('div_i_am_targeting') if m not in messages]:
print(message)
# Update list of messages
messages = driver.find_elements_by_class_name('div_i_am_targeting')
except:
# Break the loop in case no new messages after minute passed
print('No new messages')
break

它运行良好,并在页面上显示时捕获所有div,这些div与div_i_am_targeting指定的类相匹配

这个HTML页面上的div是动态生成的,大约每秒出现一个div。

页面上的实际结构如下:

<div class="div_i_am_targeting">
...
...
</div>
<div class="div_i_am_targeting">
...
...
</div>
<div class="div_i_am_targeting">
...
...
</div>
<div class="some_other_div">
...
...
</div>
<div class="div_i_am_targeting">
...
...
</div>
<div class="yet_another_div">
...
...
</div>
<div class="div_i_am_targeting">
...
...
</div>

这样,在动态创建的内容中,在我当前目标的div之间会出现其他div。

页面上div的频率是可变的。

我在这里找不到任何相关的问题,也找不到文档中的示例。

我如何修改上面的代码,使其抓取多个div的值,例如,如果我想抓取上面示例中div_i_am_targetingsome_other_div的所有实例?

您可以尝试替换

driver.find_elements_by_class_name('div_i_am_targeting')

带有

driver.find_elements_by_css_selector('.div_i_am_targeting, .some_other_div')

在您的脚本中匹配两个div

相关内容

  • 没有找到相关文章

最新更新