R newb。我的数据的小代表。
TeamHome <- c("LAL", "HOU", "SAS", "LAL")
TeamAway <- c("IND", "SAS", "LAL", "HOU")
df <- data.frame(cbind(TeamHome, TeamAway))
df
TeamHome TeamAway
LAL IND
HOU SAS
SAS LAL
LAL HOU
想象一下,这是一个赛季的前四场比赛,有数千场比赛。对于主队和客队,我想计算在主场、客场和总比赛次数。因此,主队和客队都有3个新栏目。我想得到这样的东西(在这种情况下,我只计算主队的新变量(:
TeamHome TeamAway HomeTeamGamesPlayedatHome HomeTeamGamesPlayedRoad HomeTeamTotalgames
1 LAL IND 1 0 1
2 HOU SAS 1 0 1
3 SAS LAL 1 1 2
4 LAL HOU 2 1 3
为了计算第一列(HomeTeamGamesPlayerdatHome(,我设法用来计算
df$HomeTeamGamesPlayedatHome <- ave(df$TeamHome==df$TeamHome, df$TeamHome, FUN=cumsum)
但这感觉太复杂了,而且我无法用这种方法计算其他列。
我还想过用公式表来计算出现的次数:
table(df$TeamHome)
但它只是计算总数,我希望在任何给定的时间点得到结果。谢谢
library(dplyr)
df <- df %>% group_by(TeamHome) %>%
mutate(HomeGames = seq_along(TeamHome))
lst <- list()
for(i in 1:nrow(df)) lst[[i]] <- sum(df$TeamAway[1:i] == df$TeamHome[i])
df$HomeTeamGamesPlayedRoad <- unlist(lst)
df %>% mutate(HomeTeamTotalgames = HomeGames+HomeTeamGamesPlayedRoad)
TeamHome TeamAway HomeGames HomeTeamGamesPlayedRoad HomeGames
1 LAL IND 1 0 1
2 HOU SAS 1 0 1
3 SAS LAL 1 1 2
4 LAL HOU 2 1 3
CCD_ 1是用按行迭代的CCD_。CCD_ 3是通过检查CCD_ 4中直到并包括当前游戏的值的循环来创建的。最后一行是创建的其他两行的总和。
环路解决方案:
TeamHome <- c("LAL", "HOU", "SAS", "LAL")
TeamAway <- c("IND", "SAS", "LAL", "HOU")
df <- data.frame(TeamHome,TeamAway,HomeTeamGamesPlayedatHome=ave(TeamHome==TeamHome, TeamHome, FUN=cumsum))
for (i in 1:nrow(df)) {
curdf<-df[1:i,];v<-ave(curdf$TeamAway==as.character(curdf$TeamHome[i]), curdf$TeamAway, FUN=cumsum)
df$HomeTeamGamesPlayedRoad[i] <- sum(v)
}
df$HomeTeamTotalgames <- df$HomeTeamGamesPlayedatHome + df$HomeTeamGamesPlayedRoad
TeamHome TeamAway HomeTeamGamesPlayedatHome HomeTeamGamesPlayedRoad HomeTeamTotalgames
1 LAL IND 1 0 1
2 HOU SAS 1 0 1
3 SAS LAL 1 1 2
4 LAL HOU 2 1 3