r-使用rvest将函数映射到报废链接列表时出现问题



我正在尝试应用一个函数,该函数可以从一个已删除的链接列表中提取一个表。我正处于将get_injury_data函数应用于链接的最后阶段——我在成功执行该函数时遇到了问题。我得到以下错误:

Error in matrix(unlist(values), ncol = width, byrow = TRUE) : 
'data' must be of a vector type, was 'NULL'

我想知道是否有人能帮我找出哪里出了问题。代码如下:

library(tidyverse)
library(rvest)
# create a function to grab the team links
get_team_links <- function(url){
url %>%
read_html %>%
html_nodes('td.hauptlink a') %>%
html_attr('href') %>%
.[. != '#'] %>% # remove rows with # string 
paste0('https://www.transfermarkt.com', .) %>% # pat the website link to the url strings
unique() %>% # keep only unique links
as_tibble() %>% # turn strings into a tibble datatset
rename("links" = "value") %>%  # rename the value column 
filter(!grepl('profil', links)) %>% # remove link of players included 
filter(!grepl('spielplan', links)) %>%  # remove link of additional team pages included
mutate(links = gsub("startseite", "kader", links)) # change link to go to the  detailed page
}
# create a function to grab the player links
get_player_links <- function(url){
url %>%
read_html %>%
html_nodes('td.hauptlink a') %>%
html_attr('href') %>%
.[. != '#'] %>% # remove rows with # string 
paste0('https://www.transfermarkt.com', .) %>% # pat the website link to the url strings
unique() %>% # keep only unique links
as_tibble() %>% # turn strings into a tibble datatset
rename("links" = "value")  %>%  # rename the value column 
filter(grepl('profil', links)) %>% # remove link of players included
mutate(links = gsub("profil", "verletzungen", links)) # change link to go to the injury page
}
# create a function to get the injury dataset
get_injury_data <- function(url){
url %>% 
read_html() %>%
html_nodes('#yw1') %>%
html_table()
}
# get team links and save it as team_links
team_links <- get_team_links('https://www.transfermarkt.com/premier-league/startseite/wettbewerb/GB1')
# get player links and by mapping the function on to the player_injury_links dataset 
# and then unnest the list of lists as a long list
player_injury_links <- team_links %>% 
mutate(links = map(team_links$links, get_player_links)) %>% 
unnest(links)
# using the player_injury_links list create a dataset by web scrapping the play injury pages 
player_injury_data <- map(player_injury_links$links, get_injury_data)

解决方案

因此,我遇到的问题是,我正在抓取的一些链接没有任何数据。

为了克服所使用的这个问题,我使用了purrr包中的possibly函数。这帮助我创建了一个新的、无错误的函数。

给我带来麻烦的线路代码如下:

player_injury_data <-  player_injury_links %>%  
purrr::map(., purrr::possibly(get_injury_data, otherwise = NULL, quiet = TRUE))

最新更新