我有以下问题,如果有人能帮助我,我将非常感激。
我想从java脚本网站刮数据(链接),但确实有一些问题与RSelenium。更具体的:连接到服务器。
到目前为止我的代码如下:
## R Selenium ##
install.packages("netstat")
library(RSelenium)
library(tidyverse)
library(netstat)
port1 <- 9515
port1 <- as.integer(port1)
## start the server ##
rs_driver_object <- rsDriver(browser = "chrome",
port = free_port())
我总是得到错误
"Could not open chrome browser.
Client error message:
Undefined error in httr call. httr output: Failed to connect to localhost port 14415 after 2252 ms: Connection refused
Check server log for further details.
Warning message:
In rsDriver(browser = "chrome", port = free_port()) :
Could not determine server status."
当我用下面的代码检查原因时,它说
selServ <- wdman::selenium(verbose = FALSE)
selServ$log()
$stderr
[1] "Fehler: Hauptklasse c(-Dwebdriver.chrome.driver="C:\\Users\\MarcF\\AppData\\Local\\binman\\binman_chromedriver\\win32\\113.0.5672.24.chromedriver.exe", konnte nicht gefunden oder geladen werden"
[2] "Ursache: java.lang.ClassNotFoundException: c(-Dwebdriver.chrome.driver="C:\\Users\\MarcF\\AppData\\Local\\binman\\binman_chromedriver\\win32\\113.0.5672.24.chromedriver.exe","
$stdout
character(0)
有没有人知道,我如何连接到RSelenium服务器?我有个项目真的需要它。
my session info:
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8 LC_MONETARY=German_Germany.utf8
[4] LC_NUMERIC=C LC_TIME=German_Germany.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] netstat_0.1.2 forcats_0.5.2 stringr_1.5.0 dplyr_1.1.0 purrr_1.0.1 readr_2.1.3 tidyr_1.3.0
[8] tibble_3.1.8 ggplot2_3.4.1 tidyverse_1.3.2 RSelenium_1.7.9
loaded via a namespace (and not attached):
[1] tidyselect_1.2.0 haven_2.5.1 gargle_1.2.1 colorspace_2.1-0 vctrs_0.5.2
[6] generics_0.1.3 yaml_2.3.7 utf8_1.2.3 rlang_1.0.6 pillar_1.8.1
[11] withr_2.5.0 glue_1.6.2 DBI_1.1.3 rappdirs_0.3.3 dbplyr_2.2.1
[16] readxl_1.4.1 semver_0.2.0 modelr_0.1.10 lifecycle_1.0.3 munsell_0.5.0
[21] binman_0.1.3 gtable_0.3.1 cellranger_1.1.0 rvest_1.0.3 caTools_1.18.2
[26] wdman_0.2.6 tzdb_0.3.0 ps_1.7.2 curl_5.0.0 fansi_1.0.4
[31] broom_1.0.3 Rcpp_1.0.9 backports_1.4.1 scales_1.2.1 googlesheets4_1.0.1
[36] jsonlite_1.8.4 fs_1.5.2 hms_1.1.2 stringi_1.7.8 processx_3.8.0
[41] grid_4.2.2 cli_3.4.1 tools_4.2.2 bitops_1.0-7 magrittr_2.0.3
[46] crayon_1.5.2 pkgconfig_2.0.3 ellipsis_0.3.2 xml2_1.3.3 reprex_2.0.2
[51] googledrive_2.0.0 lubridate_1.9.1 timechange_0.1.1 assertthat_0.2.1 httr_1.4.4
[56] rstudioapi_0.14 R6_2.5.1 compiler_4.2.2
亲切的问候
您可以考虑使用以下方法。你需要安装Docker才能使它工作。
library(RSelenium)
url <- "a_url.com"
shell('docker run -d -p 4446:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4446L, browserName = "firefox")
remDr$open()
remDr$navigate(url)
您还可以考虑以下方法:
library(RSelenium)
library(wdman)
url <- "a_url.com"
port <- as.integer(4444L + rpois(lambda = 1000, 1))
pJS <- wdman::phantomjs(port = port)
remDrPJS <- remoteDriver(browserName = "phantomjs", port = port)
remDrPJS$open()
remDrPJS$navigate(url)