如何在 python 中使用 selenium 网络驱动程序滚动动态网页的特定部分



>我找到了很多滚动整个网页的参考资料,但我正在寻找要滚动的特定部分。我正在处理 marketwatch.com - 部分 - 最新新闻选项卡。如何使用硒网络驱动程序滚动此最新新闻选项卡?

下面是我的代码,它返回新闻的标题,但不断重复相同的标题。

from bs4 import BeautifulSoup
import urllib
import csv
import time
from selenium import webdriver

count = 0   
browser = webdriver.Chrome()
browser.get("https://www.marketwatch.com/newsviewer")
pageSource = browser.page_source
soup = BeautifulSoup(pageSource, 'lxml')
arkodiv = soup.find("ol", class_="viewport")
while browser.find_element_by_tag_name('ol'):
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(0.5)
    div = list(arkodiv.find_all('div', class_= "nv-details"))
    heading = []
    Data_11 = list(soup.find_all("div", class_ = "nv-text-cont"))          
    datetime = list(arkodiv.find_all("li", timestamp = True))
    for sa in datetime:
        sh = sa.find("div", class_ = "nv-text-cont")
        if sh.find("a", class_ = True):
            di = sh.text.strip()
            di = di.encode('ascii', 'ignore').decode('ascii')
        else:
            continue
        print di
        heading.append((di))       
        count = count+1         

    if 'End of Results' in arkodiv:
        print 'end'
        break
    else:
        continue
    print count

发生这种情况是因为您正在执行的脚本滚动到页面底部。

要继续在获取新闻的元素内滚动,您需要替换它:

browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

有了这个:

browser.execute_script("document.documentElement.getElementsByClassName('viewport')[0].scrollTop = 999999")

编辑

这是完整的工作解决方案:

from bs4 import BeautifulSoup
import urllib
import csv
import time
from selenium import webdriver

count = 0   
browser = webdriver.Chrome()
browser.get("https://www.marketwatch.com/newsviewer")
while browser.find_element_by_tag_name('ol'):
    pageSource = browser.page_source
    soup = BeautifulSoup(pageSource, 'lxml')
    arkodiv = soup.find("ol", class_="viewport")
    browser.execute_script("document.documentElement.getElementsByClassName('viewport')[0].scrollTop = 999999")
    time.sleep(0.5)
    div = list(arkodiv.find_all('div', class_= "nv-details"))
    heading = set()
    Data_11 = list(soup.find_all("div", class_ = "nv-text-cont"))          
    datetime = list(arkodiv.find_all("li", timestamp = True))
    for sa in datetime:
        sh = sa.find("div", class_ = "nv-text-cont")
        if sh.find("a", class_ = True):
            di = sh.text.strip()
            di = di.encode('ascii', 'ignore').decode('ascii')
        else:
            continue
        print di
        heading.add((di))       
        count = count+1         

    if 'End of Results' in arkodiv:
        print 'end'
        break
    else:
        continue
    print count

编辑 2

您可能还想更改存储标头的方式,因为您当前的方式在列表中保留重复项。将其更改为set,以免发生这种情况。

最新更新