这个HTML页面在HTML根节点下至少包含三个子节点。如何在第二行代码中使用for循环来打印每个表?
root_node <- read_html("https://en.wikipedia.org/wiki/List_of_bicycle-sharing_systems")
table_nodes <- html_nodes(root_node, "table")
我对共享单车表感兴趣,它是第一个元素table_nodes[[1]]。
这里有一个简单的方法。提取第一个"table.wikitable"
节点,然后从该节点提取表。
library(rvest)
link <- "https://en.wikipedia.org/wiki/List_of_bicycle-sharing_systems"
root_node <- read_html(link)
root_node |>
html_element("table.wikitable") |>
html_table(header = TRUE)
#> # A tibble: 549 × 10
#> Country City Name System Opera…¹ Launc…² Disco…³ Stati…⁴ Bicyc…⁵ Daily…⁶
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Albania Tiran… Ecov… "" "" March … "" 8 200 ""
#> 2 Argentina Bueno… Ecob… "Sert… "Bike … 2010 "" 400 4000 "21917"
#> 3 Argentina Mendo… Metr… "" "" 2014 "" 2 40 ""
#> 4 Argentina Rosar… Mi B… "" "" 2 Dece… "" 47 480 ""
#> 5 Argentina San L… Bici… "Bici… "" 27 Nov… "" 8 80 ""
#> 6 Australia Melbo… Melb… "PBSC… "Motiv… June 2… "30 No… 53 676 ""
#> 7 Australia Melbo… oBike "4 Ge… "" July 2… "July … dockle… 1250 ""
#> 8 Australia Brisb… City… "3 Ge… "JCDec… Septem… "" 150 2000 ""
#> 9 Australia Sydney oBike "4 Ge… "" July 2… "July … dockle… 1250 ""
#> 10 Australia Sydney Ofo "4 Ge… "" Octobe… "" dockle… 600 ""
#> # … with 539 more rows, and abbreviated variable names ¹Operator, ²Launched,
#> # ³Discontinued, ⁴Stations, ⁵Bicycles, ⁶`Daily ridership`
创建于2023-04-10与reprex v2.0.2
我们可以这样做:
library(rvest)
root_node <- read_html("https://en.wikipedia.org/wiki/List_of_bicycle-sharing_systems")
table_nodes <- html_nodes(root_node, "table")
for (i in 1) {
table_html <- table_nodes[[i]]
table_df <- html_table(table_html)
print(table_df)
}
A tibble: 549 × 10
Country City Name System Operator Launched Discontinued Stati…¹ Bicyc…² Daily…³
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Albania Tirana[5] Ecovolis "" "" March 2011 "" 8 200 ""
2 Argentina Buenos Aires[6][7] Ecobici "Serttel Brasil[8]" "Bike In Baires Consortium[9]" 2010 "" 400 4000 "21917"
3 Argentina Mendoza[10] Metrobici "" "" 2014 "" 2 40 ""
4 Argentina Rosario Mi Bici Tu Bici[11] "" "" 2 December 2015 "" 47 480 ""
5 Argentina San Lorenzo, Santa Fe Biciudad "Biciudad" "" 27 November 2016 "" 8 80 ""
6 Australia Melbourne[12] Melbourne Bike Share "PBSC & 8D" "Motivate" June 2010 "30 November 2… 53 676 ""
7 Australia Melbourne[12] oBike "4 Gen. oBike" "" July 2017 "July 2018" dockle… 1250 ""
8 Australia Brisbane[14][15] CityCycle "3 Gen. Cyclocity" "JCDecaux" September 2010 "" 150 2000 ""
9 Australia Sydney oBike "4 Gen. oBike" "" July 2017 "July 2018" dockle… 1250 ""
10 Australia Sydney Ofo "4 Gen. Ofo" "" October 2017 "" dockle… 600 ""
# … with 539 more rows, and abbreviated variable names ¹Stations, ²Bicycles, ³`Daily ridership`
# ℹ Use `print(n = ...)` to see more rows
>