解释使用hclust函数的R代码



你能帮我更好地理解下面我看到的这段代码吗?看到有一些属性的信息,也使用了hclust函数。但是我不理解输出p = 12,它代表什么?为这些数据生成的最大簇数是多少?你能帮我理解吗?

library(geosphere)
Points_properties<-structure(list(Propertie=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29), Latitude = c(-24.781624, -24.775017, -24.769196, 
-24.761741, -24.752019, -24.748008, -24.737312, -24.744718, -24.751996, 
-24.724589, -24.8004, -24.796899, -24.795041, -24.780501, -24.763376, 
-24.801715, -24.728005, -24.737845, -24.743485, -24.742601, -24.766422, 
-24.767525, -24.775631, -24.792703, -24.790994, -24.787275, -24.795902, 
-24.785587, -24.787558), Longitude = c(-49.937369, 
                            -49.950576, -49.927608, -49.92762, -49.920608, -49.927707, -49.922095, 
                            -49.915438, -49.910843, -49.899478, -49.901775, -49.89364, -49.925657, 
                            -49.893193, -49.94081, -49.911967, -49.893358, -49.903904, -49.906435, 
                            -49.927951, -49.939603, -49.941541, -49.94455, -49.929797, -49.92141, 
                            -49.915141, -49.91042, -49.904772, -49.894034)), row.names = c(NA, -29L), class = c("tbl_df", "tbl", 
                                                                                                                                                  "data.frame"))
coordinates<-subset(Points_properties,select=c("Latitude","Longitude"))
d<-distm(coordinates[,2:1])
d<-as.dist(d)
fit.average<-hclust(d,method="average")
p<-1
clusters<-cutree(fit.average, p) 
nclusters<-matrix(table(clusters))

while (min(nclusters)>1) {
p<-p+1
clusters<-cutree(fit.average, p) 
nclusters<-matrix(table(clusters))}
p<-p-1
> p
[1] 12

听起来p是最小的集群数量,它将给你至少一个只有一个成员的组。

nclusters <- matrix(table(clusters)) 

nclusters将每个簇的成员数存储为一个矩阵。

while (min(nclusters)>1) { 

nclusters为1时,while循环停止。

相关内容

  • 没有找到相关文章

最新更新