协方差矩阵-R



我有一个播放器统计数据的数据框彼此。

请注意,并非所有玩家在每个游戏中都玩。

我想拥有以下内容,其中显然是" x"是相关的协方差值。

               Player.Name Damian Lillard C.J. McCollum Allen Crabbe Noah Vonleh  etc, etc
1           Damian Lillard              x             x            x           x
2            C.J. McCollum              x             x            x           x
3             Allen Crabbe              x             x            x           x
4              Noah Vonleh              x             x            x           x
5                 Ed Davis              x             x            x           x
6          Al-Farouq Aminu              x             x            x           x
7              Evan Turner              x             x            x           x
8         Maurice Harkless              x             x            x           x
9           Meyers Leonard              x             x            x           x
10           Mason Plumlee              x             x            x           x
11          Shabazz Napier              x             x            x           x
> df
          Player.Name  Tm   MB    DS     Game
1      Damian Lillard POR 54.8 59.50 20161025
11      C.J. McCollum POR 30.9 32.50 20161025
16       Allen Crabbe POR 24.1 28.25 20161025
19        Noah Vonleh POR 14.2 15.25 20161025
22           Ed Davis POR 17.9 18.00 20161025
26    Al-Farouq Aminu POR 16.3 18.25 20161025
34        Evan Turner POR 20.5 19.25 20161025
64   Maurice Harkless POR  4.7  5.25 20161025
65     Meyers Leonard POR  2.7  2.25 20161025
68      Mason Plumlee POR  4.7  4.00 20161025
290  Maurice Harkless POR 35.6 35.75 20161027
295     Mason Plumlee POR 36.6 36.75 20161027
299    Damian Lillard POR 41.5 44.25 20161027
309     C.J. McCollum POR 26.8 27.50 20161027
318      Allen Crabbe POR 17.2 16.25 20161027
349       Noah Vonleh POR  5.0  4.75 20161027
358       Evan Turner POR 10.7 10.50 20161027
359          Ed Davis POR  5.6  5.50 20161027
364    Shabazz Napier POR  0.0  0.00 20161027
369   Al-Farouq Aminu POR 13.6 13.25 20161027
545    Damian Lillard POR 56.5 58.25 20161029
557     C.J. McCollum POR 49.5 51.25 20161029
610     Mason Plumlee POR 22.9 22.50 20161029
611      Allen Crabbe POR 22.6 22.75 20161029
637       Evan Turner POR 15.6 16.75 20161029
649   Al-Farouq Aminu POR 27.9 28.25 20161029
673          Ed Davis POR  8.9  9.50 20161029
704       Noah Vonleh POR  4.8  5.00 20161029
719  Maurice Harkless POR  9.6 11.00 20161029
723    Meyers Leonard POR  6.2  6.25 20161029
728    Shabazz Napier POR  0.0  0.00 20161029

数据

structure(list(PlayerName = c("Damian Lillard", "C.J. McCollum", 
"Allen Crabbe", "Noah Vonleh", "Ed Davis", "Al-Farouq Aminu", 
"Evan Turner", "Maurice Harkless", "Meyers Leonard", "Mason Plumlee", 
"Maurice Harkless", "Mason Plumlee", "Damian Lillard", "C.J. McCollum", 
"Allen Crabbe", "Noah Vonleh", "Evan Turner", "Ed Davis", "Shabazz Napier", 
"Al-Farouq Aminu", "Damian Lillard", "C.J. McCollum", "Mason Plumlee", 
"Allen Crabbe", "Evan Turner", "Al-Farouq Aminu", "Ed Davis", 
"Noah Vonleh", "Maurice Harkless", "Meyers Leonard", "Shabazz Napier"
), TM = c("POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", 
"POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", 
"POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", 
"POR", "POR", "POR", "POR", "POR"), MB = c(54.8, 30.9, 24.1, 
14.2, 17.9, 16.3, 20.5, 4.7, 2.7, 4.7, 35.6, 36.6, 41.5, 26.8, 
17.2, 5, 10.7, 5.6, 0, 13.6, 56.5, 49.5, 22.9, 22.6, 15.6, 27.9, 
8.9, 4.8, 9.6, 6.2, 0), DS = c(59.5, 32.5, 28.25, 15.25, 18, 
18.25, 19.25, 5.25, 2.25, 4, 35.75, 36.75, 44.25, 27.5, 16.25, 
4.75, 10.5, 5.5, 0, 13.25, 58.25, 51.25, 22.5, 22.75, 16.75, 
28.25, 9.5, 5, 11, 6.25, 0), Game = c(20161025L, 20161025L, 20161025L, 
20161025L, 20161025L, 20161025L, 20161025L, 20161025L, 20161025L, 
20161025L, 20161027L, 20161027L, 20161027L, 20161027L, 20161027L, 
20161027L, 20161027L, 20161027L, 20161027L, 20161027L, 20161029L, 
20161029L, 20161029L, 20161029L, 20161029L, 20161029L, 20161029L, 
20161029L, 20161029L, 20161029L, 20161029L)), .Names = c("PlayerName", 
"TM", "MB", "DS", "Game"), row.names = c(NA, -31L), class = "data.frame")

我认为您首先需要做的是reshape数据,因此每行都是游戏,每列都是玩家游戏的MB。假设我们的数据在dat中:

dat <- dat[,-c(2,4)] #remove team name and DS
#names left in data.frame
names(dat)
[1] "PlayerName" "MB"         "Game"      
#reshape from long to wide
dat.wide <- reshape(dat, direction = 'wide',idvar = 'Game',
        timevar = 'PlayerName')
dat.wide[1:4, 1:4]
       Game MB.Damian Lillard MB.C.J. McCollum MB.Allen Crabbe
1  20161025              54.8             30.9            24.1
11 20161027              41.5             26.8            17.2
21 20161029              56.5             49.5            22.6
#compute using cov function
cov_m <- cov(dat.wide[,-1], use = 'pairwise.complete')
cov_m[1:4,1:4]
                  MB.Damian Lillard MB.C.J. McCollum MB.Allen Crabbe MB.Noah Vonleh
MB.Damian Lillard          67.46333         71.10833          28.370          17.23
MB.C.J. McCollum           71.10833        146.34333          20.495         -23.61
MB.Allen Crabbe            28.37000         20.49500          13.170          12.75
MB.Noah Vonleh             17.23000        -23.61000          12.750          28.84

您可以使用cov()函数来实现此目标,例如:

cov_mat <- cov(t(x[,3:4]))
rownames(cov_mat) <- x$PlayerName
colnames(cov_mat) <- x$PlayerName

> cov_mat[1:3,1:3]
               Damian Lillard C.J. McCollum Allen Crabbe
Damian Lillard        11.0450          3.76      9.75250
C.J. McCollum          3.7600          1.28      3.32000
Allen Crabbe           9.7525          3.32      8.61125

如果要计算相关性,只需将cov()交换为cor()

相关内容

  • 没有找到相关文章

最新更新