在r中抓取网页时试图创建多个数据帧



我对编码还是很陌生的,尤其是网页抓取,但这是我想做的:

我想要抓取fbRef.com来为每个英超球队的一些比赛统计数据创建一个数据框架。

我知道这适用于获取团队链接:

library(rvest)
page <- "https://fbref.com/en/comps/9/Premier-League-Stats"
scraped_page <- read_html(page)
teamLinks <- scraped_page%>%
html_nodes("#stats_squads_standard_for a")%>%
html_attr("href")
teamLinks <- paste0("https://fbref.com/",teamLinks)

我也能够创建一个列表,每个团队的名字基于这个相同的信息

Team <- scraped_page%>%
html_nodes('#stats_squads_standard_for .left')%>%
html_text()%>%
as.character()

但是现在我想为每个团队单独创建一个数据框架,并为每个团队的特定统计数据抓取页面。我有一个for循环来获取我需要的统计数据,但我不知道如何分离它或如何用团队名称命名每个数据帧。

for (i in 1:length(teamLinks)){
url <- teamLinks[i]
scraped_url <- read_html(url)
Team <- scraped_page%>%
html_nodes('#stats_squads_standard_for .left')%>%
html_text()%>%
as.character()
df_name <- paste0(Team[i])
df <- {
Comp <- scraped_url%>%
html_nodes(comp)%>%
html_text()
Venue <- scraped_url%>%
html_nodes(venue)%>%
html_text()
Result <- scraped_url%>%
html_nodes(result)%>%
html_text()
Goals_For <- scraped_url%>%
html_nodes(GF)%>%
html_text()
Goals_Against <- scraped_url%>%
html_nodes(GA)%>%
html_text()
Opponent <- scraped_url%>%
html_nodes(Opp)%>%
html_text()
xG <- scraped_url%>%
html_nodes(xg)%>%
html_text()
xGA <- scraped_url%>%
html_nodes(xga)%>%
html_text()
Possession <- scraped_url%>%
html_nodes(poss)%>%
html_text()
Formation <- scraped_url%>%
html_nodes(formation)%>%
html_text()
data.frame(Comp,Venue,Goals_For,Goals_Against,
Opponent,xG,xGA,Possession,Formation)
}
}

如果你能帮我清理for循环,我将不胜感激

这些也是每个HTML变量的值:

comp <- ".left:nth-child(3) a"
venue <- ".left:nth-child(6)"
result <- "#matchlogs_for .left+ .center"
GF <- "#matchlogs_for .right:nth-child(8)"
GA <- "#matchlogs_for .right:nth-child(9)"
Opp <- ".left:nth-child(10)"
xg <- "#matchlogs_for td.left+ .right"
xga <- "#matchlogs_for .right:nth-child(12)"
poss <- "#matchlogs_for td:nth-child(13)"
formation <- ".left:nth-child(16)"

提前感谢!

您可以在循环之前创建一个列表,并将每个数据帧保存到该列表中,如下所示:

TeamList <- list()
for (i in 1:length(teamLinks)){
# [...] your scraping code that leads to a "df"
TeamList[[i]] <- df
}

然后根据每个团队命名TeamList的数据框架,然后用list2env()将数据框架列表转换为多个数据框架:

names(TeamList) <- Team
list2env(TeamList, envir=.GlobalEnv)

最新更新