r-我正在完成kmeans集群模型,当试图扩展数据时,它会获取第一列的列名并返回错误



我正在R中完成kmeans集群。当我尝试缩放数据时,我收到一条错误消息"colMeans中的错误(x,na.rm=TRUE(:"x"必须是数字";。我想它正在拾取第一列的标题

dftest <- scale(test)

这应该很容易,但由于某些原因会出现此错误。有人能帮我走过这简单的第一步吗?谢谢

数据样本是测试的,如下所示:

structure(list(PIN = structure(1:5, .Label = c("a", "b", "c", 
"d", "e"), class = "factor"), v1 = c(0.8, 0.36, 0.21, 0.84, 0.43
), v2 = c(0.87, 0.01, 0.56, 0.75, 0.98), v3 = c(0.48, 0.13, 0.26, 
0.34, 0.83)), row.names = c(NA, 5L), class = "data.frame")

这是因为有些列不是numeric

scale(test)
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

检查str(test)

str(test)
'data.frame':   5 obs. of  4 variables:
$ PIN: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
$ v1 : num  0.8 0.36 0.21 0.84 0.43
$ v2 : num  0.87 0.01 0.56 0.75 0.98
$ v3 : num  0.48 0.13 0.26 0.34 0.83

"PIN";是factor列。只需通过列位置上的负索引将其删除即可

scale(test[-1])
v1         v2         v3
1  0.9766119  0.6177883  0.2687578
2 -0.6032015 -1.6334743 -1.0377036
3 -1.1417742 -0.1937133 -0.5524465
4  1.1202313  0.3036587 -0.2538268
5 -0.3518675  0.9057405  1.5752191
attr(,"scaled:center")
v1    v2    v3 
0.528 0.634 0.408 
attr(,"scaled:scale")
v1        v2        v3 
0.2785139 0.3820079 0.2678992 
> 

或者如果有更多的列,则动态子集

scale(Filter(is.numeric, test))
v1         v2         v3
1  0.9766119  0.6177883  0.2687578
2 -0.6032015 -1.6334743 -1.0377036
3 -1.1417742 -0.1937133 -0.5524465
4  1.1202313  0.3036587 -0.2538268
5 -0.3518675  0.9057405  1.5752191
attr(,"scaled:center")
v1    v2    v3 
0.528 0.634 0.408 
attr(,"scaled:scale")
v1        v2        v3 
0.2785139 0.3820079 0.2678992 

或者使用collapse,使用num_vars获取数字列,并应用快速"缩放"函数fscale

library(collapse)
fscale(num_vars(test))
v1         v2         v3
1  0.9766119  0.6177883  0.2687578
2 -0.6032015 -1.6334743 -1.0377036
3 -1.1417742 -0.1937133 -0.5524465
4  1.1202313  0.3036587 -0.2538268
5 -0.3518675  0.9057405  1.5752191

最新更新