我正在使用Selenium从推特上抓取关注者名称,该页面是无限的,每当我向下滚动时,我都可以看到新的关注者。不知怎么的,我想转到页面底部,这样我就可以抓取所有关注者。
while number != 5:
driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
number = number + 1
time.sleep(5)
usernames = driver.find_elements_by_class_name(
"css-4rbku5.css-18t94o4.css-1dbjc4n.r-1loqt21.r-1wbh5a2.r-dnmrzs.r-1ny4l3l")
for username in usernames:
print(username.get_attribute("href"))
现在代码正在滚动5次。我已经设置了一个静态值,但我不知道需要多少卷轴才能到达页面底部。
使用以下代码进行无限加载。它将一直滚动,直到加载新元素,即页面大小发生变化。
# Get scroll height after first time page load
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page / use a better technique like `waitforpageload` etc., if possible
time.sleep(2)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
在以下脚本中,没有睡眠时间,因此滚动速度更快:
SCROLL_PAUSE_TIME = 4
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
import datetime
time_past = datetime.datetime.now()
while (datetime.datetime.now() - time_past).seconds <=SCROLL_PAUSE_TIME:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
我在滚动时也遇到了同样的问题,但当我滚动到页面末尾时,列表没有加载。问题出在大页脚上,所以我稍微更正了上面的代码,并滚动到页脚。
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight - 1300);")
# Wait to load page
time.sleep(2)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
也许它对有用