网站https://www.moe.gov.sg/schoolfinder/schooldetail?schoolname=ZHONGHUA-SECONDARY-SCHOOL
我只想提取DSA talent areas offered in 2021
下的信息
然而,当我使用选择器小工具获得路径.is--open:nth-child(4) .moe-collapsible__content
时
dsa <- html_node(listpage,".is--open:nth-child(4) .moe-collapsible__content") %>% html_text() %>% unlist()
dsa
输出为NA
有什么方法可以从可折叠的内容中获取信息吗?
一种方法是
library(rvest)
library(dplyr)
library(stringr)
'https://www.moe.gov.sg/schoolfinder/schooldetail?schoolname=ZHONGHUA-SECONDARY-SCHOOL' %>%
read_html() %>% html_nodes('.moe-collapsible__content') %>% html_nodes('.moe-list') %>% html_text() %>% nth(3) %>% str_split('n')
[[1]]
[1] "Leadership and Character (Girls and Boys)r"
[2] " Chinese Orchestra (Girls and Boys)r"
[3] " Choir (Girls and Boys)r"
[4] " Concert Band (Girls and Boys)r"
[5] " Guzheng Ensemble (Girls and Boys)r"
[6] " Badminton (Girls)r"
[7] " Basketball (Girls)r"
[8] " Table Tennis (Boys)r"
[9] " Volleyball (Boys)r"
您可以更精确地使用:contains with class来定位正确的父div,然后使用子选择器来移动到子li元素。通过使用部分字符串,您可能能够为2022年提供一些未来证明。
library(magrittr)
library(rvest)
read_html("https://www.moe.gov.sg/schoolfinder/schooldetail?schoolname=ZHONGHUA-SECONDARY-SCHOOL") %>%
html_elements('.moe-collapsible:contains("DSA talent areas") li') %>% html_text()