r-如何修复Polite抓取时边界错误的下标



我正试图使用库(Polite(从网站上抓取出色的数据,但我收到了"ind_html[[1]]中出错:下标越界"。我在做什么:

library(tidyverse)
library(lubridate)
library(janitor)
library(rvest)
library(httr)
library(polite)
url <- "https://cew.georgetown.edu/cew-reports/roi2022/"
url_bow <- polite::bow(url)
url_bow
ind_html <-
polite::scrape(url_bow) %>%  
rvest::html_nodes("table_div") %>% 
rvest::html_table(fill = TRUE) 
ind_tab <- 
ind_html[[1]] %>% 
make_clean_names()
ROI_TABLE <- ind_tab %>%
bind_rows() %>%
as_tibble()

我认为这个错误与ind_html[[1]]有关,但我不知道如何修复。谢谢你的帮助!

如果您试图刮取下表,我们可以进行

df = read_csv('https://cewgeorgetown.github.io/collegeROI-2022/ROIforWeb0222.csv')
# A tibble: 4,419 x 45
Institution        State Level     `Predominant degr~ Control    `10-year NPV ra~ `10-year NPV` `15-year NPV ra~ `15-year NPV` `20-year NPV ra~ `20-year NPV`
<chr>              <chr> <chr>     <chr>              <chr>                 <dbl>         <dbl>            <dbl>         <dbl>            <dbl>         <dbl>
1 Alaska Career Col~ AK    2-year    Certificate        Private f~             2318        135000             2707        261000             2856        375000
2 Alaska Pacific Un~ AK    4-year    Bachelor's         Private n~             3537         87000             2433        274000             1760        443000
3 Alaska Vocational~ AK    Less tha~ Certificate        Public                   63        316000              240        458000              476        587000
4 University of Ala~ AK    4-year    Bachelor's         Public                 2590        124000             1547        312000             1232        484000

最新更新