我是一个网络抓取新手,我正试图在
读取完整的表https://www.fantasypros.com/daily-fantasy/mlb/draftkings-salary-changes.php。
然而,它只返回具有1486个值的表,在到达下午4:10之后开始的任何游戏之前切断。我需要学习RSelenium来解决这个问题吗?
fantasyHtml <- read_html("https://www.fantasypros.com/daily-fantasy/mlb/draftkings-salary-changes.php")
pitchersTable <- fantasyHtml %>%
html_table()
# this produces a table of only 1486 values, cutting off before any of the night games
我已经能够使用以下代码提取更多的表行:
library(RSelenium)
library(rvest)
url <- "https://www.fantasypros.com/daily-fantasy/mlb/draftkings-salary-changes.php"
shell('docker run -d -p 4446:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4446L, browserName = "firefox")
remDr$open()
remDr$navigate(url)
for(i in 1 : 500)
{
print(i)
command <- paste0("window.scrollBy(0,", i * 10, ")")
remDr$executeScript(command)
}
fantasyHtml <- read_html(remDr$getPageSource()[[1]])
pitchersTable <- fantasyHtml %>% html_table()
dim(pitchersTable[[1]])
[1] 2006 7