r-数据报废多页表

  • 本文关键字:页表 数据报 r purrr rvest
  • 更新时间 :
  • 英文 :


我目前正试图从CDC网站检索一个表(https://www.cdc.gov/obesity/data/prevalence-maps.html#states)有问题的表有多个页面必须滚动,我很难检索它并将其放入RStudio。我尝试过使用purrr中的possibly((函数,但没有成功。感谢您的帮助。

library(rvest)
library(dplyr)
library(purrr)
link <- "https://www.cdc.gov/obesity/data/prevalence-maps.html"
xpaths <- paste0('//*[@id="DataTables_Table_0', 1:9, '"]/table[2]')
scrape_table <- function(link, xpath){
link %>%
read_html() %>%
html_nodes(xpath = xpath) %>%
html_table() %>%
flatten_df %>%
setNames(c("State", "Prevalence", "95 CI"))
}
scrape_table_possibly <- possibly(scrape_table, otherwise = NULL)
scraped_tables <- map(xpaths, ~ scrape_table_possibly(link = link, xpath = .x))

页面源不包含数据,但通过JS获取外部数据,因此无论如何都无法通过rvest进行抓取。您想要的表来自此文件:https://www.cdc.gov/obesity/data/maps/2019-overall.csv

编辑:我向下滚动,看到其他表格:

https://www.cdc.gov/obesity/data/maps/2019-white.csvhttps://www.cdc.gov/obesity/data/maps/2019-hispanic.csvhttps://www.cdc.gov/obesity/data/maps/2019-black.csv

最新更新