我有一个播放器统计数据的数据框彼此。
请注意,并非所有玩家在每个游戏中都玩。
我想拥有以下内容,其中显然是" x"是相关的协方差值。
Player.Name Damian Lillard C.J. McCollum Allen Crabbe Noah Vonleh etc, etc
1 Damian Lillard x x x x
2 C.J. McCollum x x x x
3 Allen Crabbe x x x x
4 Noah Vonleh x x x x
5 Ed Davis x x x x
6 Al-Farouq Aminu x x x x
7 Evan Turner x x x x
8 Maurice Harkless x x x x
9 Meyers Leonard x x x x
10 Mason Plumlee x x x x
11 Shabazz Napier x x x x
> df
Player.Name Tm MB DS Game
1 Damian Lillard POR 54.8 59.50 20161025
11 C.J. McCollum POR 30.9 32.50 20161025
16 Allen Crabbe POR 24.1 28.25 20161025
19 Noah Vonleh POR 14.2 15.25 20161025
22 Ed Davis POR 17.9 18.00 20161025
26 Al-Farouq Aminu POR 16.3 18.25 20161025
34 Evan Turner POR 20.5 19.25 20161025
64 Maurice Harkless POR 4.7 5.25 20161025
65 Meyers Leonard POR 2.7 2.25 20161025
68 Mason Plumlee POR 4.7 4.00 20161025
290 Maurice Harkless POR 35.6 35.75 20161027
295 Mason Plumlee POR 36.6 36.75 20161027
299 Damian Lillard POR 41.5 44.25 20161027
309 C.J. McCollum POR 26.8 27.50 20161027
318 Allen Crabbe POR 17.2 16.25 20161027
349 Noah Vonleh POR 5.0 4.75 20161027
358 Evan Turner POR 10.7 10.50 20161027
359 Ed Davis POR 5.6 5.50 20161027
364 Shabazz Napier POR 0.0 0.00 20161027
369 Al-Farouq Aminu POR 13.6 13.25 20161027
545 Damian Lillard POR 56.5 58.25 20161029
557 C.J. McCollum POR 49.5 51.25 20161029
610 Mason Plumlee POR 22.9 22.50 20161029
611 Allen Crabbe POR 22.6 22.75 20161029
637 Evan Turner POR 15.6 16.75 20161029
649 Al-Farouq Aminu POR 27.9 28.25 20161029
673 Ed Davis POR 8.9 9.50 20161029
704 Noah Vonleh POR 4.8 5.00 20161029
719 Maurice Harkless POR 9.6 11.00 20161029
723 Meyers Leonard POR 6.2 6.25 20161029
728 Shabazz Napier POR 0.0 0.00 20161029
数据
structure(list(PlayerName = c("Damian Lillard", "C.J. McCollum",
"Allen Crabbe", "Noah Vonleh", "Ed Davis", "Al-Farouq Aminu",
"Evan Turner", "Maurice Harkless", "Meyers Leonard", "Mason Plumlee",
"Maurice Harkless", "Mason Plumlee", "Damian Lillard", "C.J. McCollum",
"Allen Crabbe", "Noah Vonleh", "Evan Turner", "Ed Davis", "Shabazz Napier",
"Al-Farouq Aminu", "Damian Lillard", "C.J. McCollum", "Mason Plumlee",
"Allen Crabbe", "Evan Turner", "Al-Farouq Aminu", "Ed Davis",
"Noah Vonleh", "Maurice Harkless", "Meyers Leonard", "Shabazz Napier"
), TM = c("POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR",
"POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR",
"POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR",
"POR", "POR", "POR", "POR", "POR"), MB = c(54.8, 30.9, 24.1,
14.2, 17.9, 16.3, 20.5, 4.7, 2.7, 4.7, 35.6, 36.6, 41.5, 26.8,
17.2, 5, 10.7, 5.6, 0, 13.6, 56.5, 49.5, 22.9, 22.6, 15.6, 27.9,
8.9, 4.8, 9.6, 6.2, 0), DS = c(59.5, 32.5, 28.25, 15.25, 18,
18.25, 19.25, 5.25, 2.25, 4, 35.75, 36.75, 44.25, 27.5, 16.25,
4.75, 10.5, 5.5, 0, 13.25, 58.25, 51.25, 22.5, 22.75, 16.75,
28.25, 9.5, 5, 11, 6.25, 0), Game = c(20161025L, 20161025L, 20161025L,
20161025L, 20161025L, 20161025L, 20161025L, 20161025L, 20161025L,
20161025L, 20161027L, 20161027L, 20161027L, 20161027L, 20161027L,
20161027L, 20161027L, 20161027L, 20161027L, 20161027L, 20161029L,
20161029L, 20161029L, 20161029L, 20161029L, 20161029L, 20161029L,
20161029L, 20161029L, 20161029L, 20161029L)), .Names = c("PlayerName",
"TM", "MB", "DS", "Game"), row.names = c(NA, -31L), class = "data.frame")
我认为您首先需要做的是reshape
数据,因此每行都是游戏,每列都是玩家游戏的MB
。假设我们的数据在dat
中:
dat <- dat[,-c(2,4)] #remove team name and DS
#names left in data.frame
names(dat)
[1] "PlayerName" "MB" "Game"
#reshape from long to wide
dat.wide <- reshape(dat, direction = 'wide',idvar = 'Game',
timevar = 'PlayerName')
dat.wide[1:4, 1:4]
Game MB.Damian Lillard MB.C.J. McCollum MB.Allen Crabbe
1 20161025 54.8 30.9 24.1
11 20161027 41.5 26.8 17.2
21 20161029 56.5 49.5 22.6
#compute using cov function
cov_m <- cov(dat.wide[,-1], use = 'pairwise.complete')
cov_m[1:4,1:4]
MB.Damian Lillard MB.C.J. McCollum MB.Allen Crabbe MB.Noah Vonleh
MB.Damian Lillard 67.46333 71.10833 28.370 17.23
MB.C.J. McCollum 71.10833 146.34333 20.495 -23.61
MB.Allen Crabbe 28.37000 20.49500 13.170 12.75
MB.Noah Vonleh 17.23000 -23.61000 12.750 28.84
您可以使用cov()
函数来实现此目标,例如:
cov_mat <- cov(t(x[,3:4]))
rownames(cov_mat) <- x$PlayerName
colnames(cov_mat) <- x$PlayerName
> cov_mat[1:3,1:3]
Damian Lillard C.J. McCollum Allen Crabbe
Damian Lillard 11.0450 3.76 9.75250
C.J. McCollum 3.7600 1.28 3.32000
Allen Crabbe 9.7525 3.32 8.61125
如果要计算相关性,只需将cov()
交换为cor()
。