如何检索每个客户的集群编号,该编号与 R 中的质心一起



我有一个超过 20000 行的数据集。 其中每一行都是唯一的客户。 我做了 k 均值聚类和输出看起来像这样。

str(km.out.best)
List of 9
$ cluster     : Named int [1:24] 2 1 1 3 4 2 6 4 5 2 ...
..- attr(*, "names")= chr [1:24] "nr_pxx_sxx" "sxxxxxxxx
$ centers     : num [1:10, 1:20000] -0.1806 -0.3596 -0.7953 0.0781 -0.5887 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:10] "1" "2" "3" "4" ...
.. ..$ : NULL
$ totss       : num 618756
$ withinss    : num [1:10] 1294 68340 0 4363 2530 ...
$ tot.withinss: num 184130
$ betweenss   : num 434625
$ size        : int [1:10] 2 4 1 3 2 2 2 2 2 4
$ iter        : int 3
$ ifault      : int 0
- attr(*, "class")= chr "kmeans"
  • 我想知道如何在质心的值旁边获得聚类编号。 所以像

    #Example 输出

    cust_id    centers  cluster_number 
    1         -0.1806      1
    2         -0.3596      1
    3        -0.7953       2
    4         0.0781       ..
    5        -0.5887       3
    

感谢阿达文斯

假设您的数据是这样的:

dat = matrix(runif(20000*24),nrow=20000)
dim(dat)
dim(dat)
[1] 20000    24

你不转置。然后你运行kmeans,很可能你需要将算法更改为Macqueen或Lloyd,并增加数据的最大迭代:

km.out.best = kmeans(dat,10,algorithm="MacQueen",iter.max=200)
result = data.frame(id=1:nrow(dat),cluster=km.out.best$cluster)
head(result)
id cluster
1  1       5
2  2      10
3  3       7
4  4       3
5  5       7
6  6       6

您的中心如下所示:

head(km.out.best$centers)
[,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
1 0.3775496 0.2755110 0.5222402 0.5884940 0.4679775 0.6600569 0.4986263
2 0.7126183 0.2803162 0.3942072 0.6419705 0.5341550 0.5711218 0.5053729
3 0.6413244 0.6578503 0.5333248 0.4661831 0.5552559 0.5561365 0.4451808
4 0.3234074 0.6514881 0.4079006 0.6715400 0.4791075 0.4223853 0.6221334
5 0.6473756 0.6532055 0.6182789 0.5097219 0.5376246 0.5365016 0.4391964
6 0.6970183 0.4965848 0.5065735 0.3036086 0.4303340 0.3970691 0.5170568
[,8]      [,9]     [,10]     [,11]     [,12]     [,13]     [,14]
1 0.4594594 0.4345581 0.5701588 0.5906317 0.4385964 0.5218407 0.5516426
2 0.4628033 0.4235150 0.3608926 0.5285110 0.5168564 0.4346563 0.4062454
3 0.5265977 0.5334992 0.5376332 0.4512221 0.4647484 0.4902010 0.4676214
4 0.5939197 0.4694504 0.3937454 0.3384044 0.5686476 0.6172650 0.5186179
5 0.4654073 0.6234457 0.4909938 0.5596412 0.4936359 0.4770979 0.6025122
6 0.5156159 0.4322397 0.5056121 0.5290063 0.5568705 0.4741198 0.5276150
[,15]     [,16]     [,17]     [,18]     [,19]     [,20]     [,21]
1 0.5504851 0.2829263 0.5801165 0.4646302 0.6408827 0.4199201 0.5407101
2 0.5626282 0.6359599 0.5034993 0.4243469 0.3807163 0.5950345 0.4706131
3 0.3517145 0.2888798 0.6448517 0.3631902 0.5299283 0.4487787 0.4675805
4 0.4331985 0.4305047 0.4862307 0.4381856 0.3399696 0.4781299 0.5236181
5 0.6830292 0.6005151 0.5231041 0.5242238 0.4303912 0.3199860 0.3725459
6 0.2797726 0.4564681 0.5102230 0.6247973 0.4563937 0.6386731 0.5464769
[,22]     [,23]     [,24]
1 0.5655326 0.5366878 0.6097194
2 0.4910263 0.3989447 0.4676507
3 0.4119647 0.3304486 0.3322215
4 0.5843183 0.4549804 0.6379758
5 0.6010346 0.6001782 0.6310740
6 0.5110444 0.6080165 0.6967485

它具有与您的数据一样多的列。如果你想附加它并创建一个重复冗余信息的巨大data.frame,这里是:

head(cbind(result,km.out.best$centers[result$cluster,]))
id cluster         1         2         3         4         5         6
X5    1       5 0.6473756 0.6532055 0.6182789 0.5097219 0.5376246 0.5365016
X10   2      10 0.4280159 0.5213989 0.6012614 0.6827887 0.4621622 0.4026403
X7    3       7 0.3671682 0.5811399 0.4086544 0.3584764 0.4406988 0.5859552
X3    4       3 0.6413244 0.6578503 0.5333248 0.4661831 0.5552559 0.5561365
X7.1  5       7 0.3671682 0.5811399 0.4086544 0.3584764 0.4406988 0.5859552
X6    6       6 0.6970183 0.4965848 0.5065735 0.3036086 0.4303340 0.3970691
7         8         9        10        11        12        13
X5   0.4391964 0.4654073 0.6234457 0.4909938 0.5596412 0.4936359 0.4770979
X10  0.4308780 0.5798660 0.6022418 0.5895790 0.6293778 0.4796867 0.5552222
X7   0.3682988 0.6069791 0.3902141 0.6102076 0.3622590 0.5181898 0.5504739
X3   0.4451808 0.5265977 0.5334992 0.5376332 0.4512221 0.4647484 0.4902010
X7.1 0.3682988 0.6069791 0.3902141 0.6102076 0.3622590 0.5181898 0.5504739
X6   0.5170568 0.5156159 0.4322397 0.5056121 0.5290063 0.5568705 0.4741198
14        15        16        17        18        19        20
X5   0.6025122 0.6830292 0.6005151 0.5231041 0.5242238 0.4303912 0.3199860
X10  0.5755699 0.3837531 0.6864855 0.3524426 0.5525500 0.6080231 0.6136993
X7   0.3925091 0.6750364 0.6796406 0.5637069 0.4988824 0.5664360 0.5727071
X3   0.4676214 0.3517145 0.2888798 0.6448517 0.3631902 0.5299283 0.4487787
X7.1 0.3925091 0.6750364 0.6796406 0.5637069 0.4988824 0.5664360 0.5727071
X6   0.5276150 0.2797726 0.4564681 0.5102230 0.6247973 0.4563937 0.6386731
21        22        23        24
X5   0.3725459 0.6010346 0.6001782 0.6310740
X10  0.5897833 0.5092839 0.4041542 0.4247683
X7   0.4674218 0.5450985 0.5607961 0.4179112
X3   0.4675805 0.4119647 0.3304486 0.3322215
X7.1 0.4674218 0.5450985 0.5607961 0.4179112
X6   0.5464769 0.5110444 0.6080165 0.6967485

最新更新