问题与R和房车抓取网页



我使用下面的代码从网页中提取一个表:

library(rvest)
library(dplyr)
#Link to site and then getting html code. 
link <- "https://www.stats.gov.sa/en/915"
page <- read_html(link)
#extract table from html
files <- page %>%
html_nodes("table") %>%
.[[1]] %>%
html_table()

然而,我得到的结果与网页上的结果不同。结果如下所示:

标题:1 × 4名称Report Period周期性下载

1 please wait…请稍等……请稍等……请稍等…

我想知道是否有一种方法可以在不使用Rselenium的情况下通过web浏览器查看表格。这是因为这似乎不工作与r studio online

解决方案可以是RSelenium

下面是一个简单的例子

library(RSelenium)
library(rvest)
library(dplyr)
#Your URL
URL <- "https://www.stats.gov.sa/en/915"
#Open the browser by RSelenium
rD <- RSelenium::rsDriver(browser = "firefox", port = 4544L, verbose = F)
remDr <- rD[["client"]]
#Open the page into browser
remDr$navigate(URL)
#Get the table that you see
remDr$getPageSource()[[1]] %>% 
read_html() %>%
html_table()

[[1]]
# A tibble: 13 x 4
Name                           `Report Period` Periodicity Download
<chr>                                    <int> <chr>       <lgl>   
1 Ar-Riyad Region                           2017 Annual      NA      
2 Makkah Al-Mokarramah Region               2017 Annual      NA      
3 Al-Madinah Al-Monawarah Region            2017 Annual      NA      
4 Al-Qaseem Region                          2017 Annual      NA      
5 Eastern Region                            2017 Annual      NA      
6 Aseer Region                              2017 Annual      NA      
7 Tabouk Region                             2017 Annual      NA      
8 Hail Region                               2017 Annual      NA      
9 Northern Borders Region                   2017 Annual      NA      
10 Jazan Region                              2017 Annual      NA      
11 Najran Region                             2017 Annual      NA      
12 Al-Baha Region                            2017 Annual      NA      
13 Al-Jouf Region                            2017 Annual      NA 

最新更新