大家好。
我现在正试图获得r在这个网站(https://yomou.syosetu.com/search.php?&type=er&order_former=search&order=new¬nizi=1&p=1)上的文章标题的数据。
我执行了以下代码:
### read HTML ###
html_narou <- rvest::read_html("https://yomou.syosetu.com/search.php?&type=er&order_former=search&order=new¬nizi=1&p=1",
encoding = "UTF-8")
### create the common part object of CSS ###
base_css_former <- "#main_search > div:nth-child("
base_css_latter <- ") > div > a"
### create NULL objects ###
art_css <- NULL
narou_titles <- NULL
### extract the title data and store them into the NULL object ###
#### The titles of the articles doesn't exist in the " #main_search > div:nth-child(1~4) > div > a ", so i in the loop starts from five ####
for (i in 5:24) {
art_css <- paste0(base_css_former, as.character(i), base_css_latter)
narou_title <- rvest::html_element(x = html_narou,
css = art_css) %>%
rvest::html_text()
narou_titles <- base::append(narou_titles, narou_title)
}
但是在R中通过for循环来完成这个需要很长时间,我想使用"map"函数中的"purrr"代替。但是我不熟悉purrr::map,而且过程比较复杂。如何用map代替for-loop?
真正的问题是,您在每次迭代中都增加了narou_titles
向量的大小,这在r中是出了名的慢。相反,您应该预先分配向量的最终长度,然后按索引分配元素。Purrr在后台完成这个,这可以使它看起来更快,但是你可以不使用Purrr做同样的事情。
与您的for
循环:
library(rvest)
narou_titles <- vector("character", 20)
for (i in 5:24) {
art_css <- paste0(base_css_former, as.character(i), base_css_latter)
narou_titles[[i]] <- html_element(
x = html_narou,
css = art_css
) %>%
html_text()
}
Withpurrr::map_chr()
:
library(rvest)
library(purrr)
get_title <- function(i) {
art_css <- paste0(base_css_former, as.character(i), base_css_latter)
html_element(
x = html_narou,
css = art_css
) %>%
html_text()
}
narou_titles <- map_chr(5:24, get_title)