我对编码还是很陌生的,尤其是网页抓取,但这是我想做的:
我想要抓取fbRef.com来为每个英超球队的一些比赛统计数据创建一个数据框架。
我知道这适用于获取团队链接:
library(rvest)
page <- "https://fbref.com/en/comps/9/Premier-League-Stats"
scraped_page <- read_html(page)
teamLinks <- scraped_page%>%
html_nodes("#stats_squads_standard_for a")%>%
html_attr("href")
teamLinks <- paste0("https://fbref.com/",teamLinks)
我也能够创建一个列表,每个团队的名字基于这个相同的信息
Team <- scraped_page%>%
html_nodes('#stats_squads_standard_for .left')%>%
html_text()%>%
as.character()
但是现在我想为每个团队单独创建一个数据框架,并为每个团队的特定统计数据抓取页面。我有一个for循环来获取我需要的统计数据,但我不知道如何分离它或如何用团队名称命名每个数据帧。
for (i in 1:length(teamLinks)){
url <- teamLinks[i]
scraped_url <- read_html(url)
Team <- scraped_page%>%
html_nodes('#stats_squads_standard_for .left')%>%
html_text()%>%
as.character()
df_name <- paste0(Team[i])
df <- {
Comp <- scraped_url%>%
html_nodes(comp)%>%
html_text()
Venue <- scraped_url%>%
html_nodes(venue)%>%
html_text()
Result <- scraped_url%>%
html_nodes(result)%>%
html_text()
Goals_For <- scraped_url%>%
html_nodes(GF)%>%
html_text()
Goals_Against <- scraped_url%>%
html_nodes(GA)%>%
html_text()
Opponent <- scraped_url%>%
html_nodes(Opp)%>%
html_text()
xG <- scraped_url%>%
html_nodes(xg)%>%
html_text()
xGA <- scraped_url%>%
html_nodes(xga)%>%
html_text()
Possession <- scraped_url%>%
html_nodes(poss)%>%
html_text()
Formation <- scraped_url%>%
html_nodes(formation)%>%
html_text()
data.frame(Comp,Venue,Goals_For,Goals_Against,
Opponent,xG,xGA,Possession,Formation)
}
}
如果你能帮我清理for循环,我将不胜感激
这些也是每个HTML变量的值:
comp <- ".left:nth-child(3) a"
venue <- ".left:nth-child(6)"
result <- "#matchlogs_for .left+ .center"
GF <- "#matchlogs_for .right:nth-child(8)"
GA <- "#matchlogs_for .right:nth-child(9)"
Opp <- ".left:nth-child(10)"
xg <- "#matchlogs_for td.left+ .right"
xga <- "#matchlogs_for .right:nth-child(12)"
poss <- "#matchlogs_for td:nth-child(13)"
formation <- ".left:nth-child(16)"
提前感谢!
您可以在循环之前创建一个列表,并将每个数据帧保存到该列表中,如下所示:
TeamList <- list()
for (i in 1:length(teamLinks)){
# [...] your scraping code that leads to a "df"
TeamList[[i]] <- df
}
然后根据每个团队命名TeamList
的数据框架,然后用list2env()
将数据框架列表转换为多个数据框架:
names(TeamList) <- Team
list2env(TeamList, envir=.GlobalEnv)