我正试图使用包baller从basketballreference收集一些数据。我正在尝试使用NBASeasonTeamByYear功能来收集多个赛季球队的赛季成绩。也就是说,我想要每个团队2017年至2020年的数据,然后将数据帧组合成两个更大的数据帧,按会议分开。
我首先用每个团队的代码和会议制作了一个数据帧
league_teams <- data.frame("team" = c("ATL", "BOS", "NJN", "CHA", "CHI", "CLE", "DAL", "DEN",
"DET", "GSW", "HOU", "IND", "LAC", "LAL", "MEM", "MIA",
"MIL", "MIN", "NOH", "NYK", "OKC", "ORL", "PHI", "PHO",
"POR", "SAC", "SAS", "TOR", "UTA", "WAS"),
"conference" = c("East", "East", "East", "East", "East", "East", "West",
"West", "East", "West", "West", "East", "West", "West",
"West", "East", "East", "West", "West", "East", "West",
"East", "East", "West", "West", "West", "West", "East",
"West", "East"))
league_teams$team <- as.character(league_teams$team)
league_teams$conference <- as.factor(league_teams$conference)
现在,我在编写循环时遇到了困难,该循环首先使用每个独特团队的函数,使用他们的代码和我想要的年份,然后将它们组合在一起,无论年份如何,而是在每次会议中。
我从这个开始
for (team in league_teams) {
team_2017 <- NBASeasonTeamByYear(team = team, 2017)
team_2017$season <- as.factor(2017)
team_2017$team <- as.factor(team)
}
后几行说明了我想添加两列——一列用于相应的年份,一列用于各自的团队代码,但不仅用于2017年,而且一直到2020年。虽然我在编写循环时遇到了问题,我想以后我会使用rbind来组合它们,但我不确定如何做到这一点,并在我制作的原始数据帧中通过会议进行区分。
考虑在用户定义的方法中推广您的流程,并使用expand.grid
(所有组合(和Map
(元素循环(传递参数:
nba_df_build <- function(yr, team, conf) {
# base::TRANSFORM OR dplyr::MUTATE
transform(NBASeasonTeamByYear(team = team, season = yr),
season = as.factor(yr),
team = as.factor(team),
conference = as.factor(conf))
}
params_df <- expand.grid(year = 2017:2020,
team = league_teams$team,
conference = league_teams$conference)
df_list <- Map(nba_df_build, params_df$year, params_df$team, params_df$conference)
final_df <- do.call(rbind, df_list)
#final_df <- dplyr::bind_rows(df_list)
对于数据帧的任何拆分:
# LIST OF TWO CONFERENCE DATA FRAMES
conference_dfs <- split(final_df, final_df$conference)
# LIST OF FOUR SEASON DATA FRAMES
season_dfs <- split(final_df, final_df$season)
# LIST OF THIRTY TEAM DATA FRAMES
team_dfs <- split(final_df, final_df$team)