在 R 中查找 CSV 文件数据集的相关性



在r中读取csv文件后,我找不到相关性,发送错误说"x必须是数字">

s = read.csv(file.choose(), header=T)
Error in cor(s) : 'x' must be numeric

数据集https://github.com/vincentarelbundock/Rdatasets/blob/master/csv/MASS/UScereal.csv

您的数据不完全是数值,因此您应该仅对数值数据执行相关函数

library(MASS)
data("UScereal")
# type of variables
str(UScereal)
# 'data.frame': 65 obs. of  11 variables:
# $ mfr      : Factor w/ 6 levels "G","K","N","P",..: 3 2 2 1 2 1 6 4 5 1 ...
# $ calories : num  212 212 100 147 110 ...
# $ protein  : num  12.12 12.12 8 2.67 2 ...
# $ fat      : num  3.03 3.03 0 2.67 0 ...
# $ sodium   : num  394 788 280 240 125 ...
# $ fibre    : num  30.3 27.3 28 2 1 ...
# $ carbo    : num  15.2 21.2 16 14 11 ...
# $ sugars   : num  18.2 15.2 0 13.3 14 ...
# $ shelf    : int  3 3 3 1 2 3 1 3 2 1 ...
# $ potassium: num  848.5 969.7 660 93.3 30 ...
# $ vitamins : Factor w/ 3 levels "100%","enriched",..: 2 2 2 2 2 2 2 2 2 2 ...

然后应仅执行关联对于数值,即第 2 列到第 8 列和第 10 列

# correlation matrix
cor(UScereal[c(2:8,10)])
#            calories   protein       fat    sodium     fibre       carbo      sugars potassium
# calories  1.0000000 0.7060105 0.5901757 0.5286552 0.3882179  0.78872268  0.49529421 0.4765955
# protein   0.7060105 1.0000000 0.4112661 0.5727222 0.8096397  0.54709029  0.18484845 0.8417540
# fat       0.5901757 0.4112661 1.0000000 0.2595606 0.2260715  0.18285220  0.41567397 0.3232754
# sodium    0.5286552 0.5727222 0.2595606 1.0000000 0.4954831  0.42356172  0.21124365 0.5566426
# fibre     0.3882179 0.8096397 0.2260715 0.4954831 1.0000000  0.20307489  0.14891577 0.9638662
# carbo     0.7887227 0.5470903 0.1828522 0.4235617 0.2030749  1.00000000 -0.04082599 0.2420485
# sugars    0.4952942 0.1848484 0.4156740 0.2112437 0.1489158 -0.04082599  1.00000000 0.2718335
# potassium 0.4765955 0.8417540 0.3232754 0.5566426 0.9638662  0.24204848  0.27183347 1.0000000

最新更新