我使用下面的代码从网页中提取一个表:
library(rvest)
library(dplyr)
#Link to site and then getting html code.
link <- "https://www.stats.gov.sa/en/915"
page <- read_html(link)
#extract table from html
files <- page %>%
html_nodes("table") %>%
.[[1]] %>%
html_table()
然而,我得到的结果与网页上的结果不同。结果如下所示:
标题:1 × 4名称Report Period
周期性下载
1 please wait…请稍等……请稍等……请稍等…
我想知道是否有一种方法可以在不使用Rselenium的情况下通过web浏览器查看表格。这是因为这似乎不工作与r studio online
解决方案可以是RSelenium
下面是一个简单的例子
library(RSelenium)
library(rvest)
library(dplyr)
#Your URL
URL <- "https://www.stats.gov.sa/en/915"
#Open the browser by RSelenium
rD <- RSelenium::rsDriver(browser = "firefox", port = 4544L, verbose = F)
remDr <- rD[["client"]]
#Open the page into browser
remDr$navigate(URL)
#Get the table that you see
remDr$getPageSource()[[1]] %>%
read_html() %>%
html_table()
[[1]]
# A tibble: 13 x 4
Name `Report Period` Periodicity Download
<chr> <int> <chr> <lgl>
1 Ar-Riyad Region 2017 Annual NA
2 Makkah Al-Mokarramah Region 2017 Annual NA
3 Al-Madinah Al-Monawarah Region 2017 Annual NA
4 Al-Qaseem Region 2017 Annual NA
5 Eastern Region 2017 Annual NA
6 Aseer Region 2017 Annual NA
7 Tabouk Region 2017 Annual NA
8 Hail Region 2017 Annual NA
9 Northern Borders Region 2017 Annual NA
10 Jazan Region 2017 Annual NA
11 Najran Region 2017 Annual NA
12 Al-Baha Region 2017 Annual NA
13 Al-Jouf Region 2017 Annual NA