Webscratch:使用inspect查找R中的节点/表ID

我正在做一个web报废练习，我想使用下面的url获得下表：

https://en.wikipedia.org/wiki/COVID-19_pandemic_by_country_and_territory

更新日期：2022年4月7日新冠肺炎病例、死亡人数和发病率[5]

我右键单击浏览器，检查并想查找表ID/节点它将取代下面代码中的CCD_ 1。我找不到此节点。

library(tidyverse)
library(rvest)
# get the data 
url <- "https://en.wikipedia.org/wiki/COVID-19_pandemic_by_country_and_territory"
html_data <- read_html(url)
html_data %>%
html_node("??") %>% # how do I get the node containing the table
html_table() %>% 
as_tibble()

谢谢

我建议使用一个更稳定、更快、更具描述性的css选择器列表，而不是一个长而脆弱的xpath。有一个特定的父id(通常用于匹配的最快方法(和子表类(第二快(的组合可以使用：

library(magrittr)
library(rvest)
df <- read_html('https://en.wikipedia.org/wiki/COVID-19_pandemic_by_country_and_territory') %>%
html_element('#covid-19-cases-deaths-and-rates-by-location .wikitable') %>%
html_table()

推荐读数：

https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors

实践：

https://flukeout.github.io/

使用浏览器获取表的xpath，并使用它代替"??"。

suppressPackageStartupMessages({
library(httr)
library(rvest)
library(dplyr)
})
url <- "https://en.wikipedia.org/wiki/COVID-19_pandemic_by_country_and_territory"
xp <- "/html/body/div[3]/div[3]/div[5]/div[1]/div[15]/div[5]/table"
html_data <- read_html(url)
html_data %>%
html_elements(xpath = xp) %>% # how do I get the node containing the table
html_table() %>%
.[[1]] %>%
select(-1)
#> # A tibble: 218 x 4
#>    Country                `Deaths / million` Deaths    Cases      
#>    <chr>                  <chr>              <chr>     <chr>      
#>  1 World[a]               783                6,166,510 495,130,920
#>  2 Peru                   6,366              212,396   3,549,511  
#>  3 Bulgaria               5,314              36,655    1,143,424  
#>  4 Bosnia and Herzegovina 4,819              15,728    375,948    
#>  5 Hungary                4,738              45,647    1,863,039  
#>  6 North Macedonia        4,433              9,234     307,142    
#>  7 Montenegro             4,308              2,706     233,523    
#>  8 Georgia                4,212              16,765    1,650,384  
#>  9 Croatia                3,833              15,646    1,105,315  
#> 10 Czech Republic         3,712              39,816    3,850,902  
#> # ... with 208 more rows

^{创建于2022-04-08由reprex包(v2.0.1(}

更新日期：2022年4月7日新冠肺炎病例、死亡人数和发病率[5]

相关内容

最新更新

热门标签：