如何按组计算相关性



我正在尝试运行一个迭代for循环,以计算因子变量级别的相关性。我的数据集中有32个团队的16行数据。我想把每支球队各自的年度积分联系起来。我可以一个接一个地做到这一点,但我想在循环方面做得更好。

correlate <- data %>%
select(Team, Year, Points_Game) %>% 
filter(Team == "ARI") %>% 
select(Year, Points_Game)

cor(correlate)

我制作了一个对象"团队"通过:

teams <- levels(data$Team)

使用[i]对所有32个团队进行迭代,以获得每个团队的年度和积分相关性,这将非常有帮助!

require(dplyr)
# dummy data
data = data.frame(
Team = sapply(1:32, function(x) paste0("T", x)),
Year = rep(c(2000:2009), 32),
Points_Game = rnorm(320, 100, 10)
)
# find correlation of Year and Points_Game for each team
# r - correlation coefficient
correlate <- data %>%
group_by(Team) %>% 
summarise(r = cor(Year, Points_Game))

数据表方式:

library(data.table)
# dummy data (same as @Aleksandr's)
dat <- data.table(
Team = sapply(1:32, function(x) paste0("T", x)),
Year = rep(c(2000:2009), 32),
Points_Game = rnorm(320, 100, 10)
)
# find correlation of Year and Points_Game for each Team
result <- dat[ , .(r = cor(Year, Points_Game)), by = Team]

最新更新