r-如果(sum(abs(dc))<1e-15)break:需要TRUE/FALSE的地方缺少值:Kernel K



我正在尝试使用kernlabR包中的kkmeans()函数来实现Kernel K Means集群。我的问题是,当我用函数的clusters参数指定了一些数量的集群时,我的代码返回了预期的输出,但对其他数量的集群抛出了一个错误:

if(sum(abs(dc((中的错误<1e-15(断裂:缺少值,其中TRUE/FALSE需要

我的猜测是,这是一个收敛问题,因为当我增加集群数量时,错误似乎会出现,但这会令人惊讶,因为我的行数比我指定的集群数量多得多。虽然我可以用8000x3矩阵成功指定10个集群,但我收到了100个集群的错误。类似地,我可以指定5个集群,但不能指定具有该数据的50行子集的10个集群。

下面是一个可复制的最小示例,其中我的代码复制了成功和错误。

如果centers = 10则出错

kernlab::kkmeans(mymat, centers=10)
#> Using automatic sigma estimation (sigest) for RBF or laplace kernel
#> Error in if (sum(abs(dc)) < 1e-15) break: missing value where TRUE/FALSE needed

如果centers = 5没有错误

kernlab::kkmeans(mymat, centers=5)
#> Using automatic sigma estimation (sigest) for RBF or laplace kernel
#> Spectral Clustering object of class "specc" 
#> 
#>  Cluster memberships: 
#>  
#> 1 1 1 1 2 1 1 3 3 5 5 5 3 2 2 2 4 4 3 3 5 2 2 5 5 5 5 5 5 2 4 3 3 3 2 2 5 3 3 5 5 4 4 4 3 1 4 2 5 3 
#>  
#> Gaussian Radial Basis kernel function. 
#>  Hyperparameter : sigma =  0.756590498067127 
#> 
#> Centers:  
#>          [,1]      [,2]     [,3]
#> [1,] 15.75871 -16.69486 191.5841
#> [2,] 16.74850 -21.94730 186.8914
#> [3,] 15.99483 -18.95892 190.2622
#> [4,] 15.45729 -18.13571 191.9611
#> [5,] 16.69136 -22.19600 187.0055
#> 
#> Cluster size:  
#> [1]  7 10 12  7 14
#> 
#> Within-cluster sum of squares:  
#> [1] 301006.7 443237.8 607889.4 305777.1 685823.5

示例数据(50x3矩阵(

mymat <- structure(c(15.9390001296997, 15.9079999923706, 16.087999343872, 
15.7930002212524, 15.9619998931884, 15.6129999160766, 15.7550001144409, 
16.7740001678466, 16.9080009460449, 17.0769996643066, 16.3640003204345, 
16.5960006713867, 16.579999923706, 16.4570007324218, 16.2320003509521, 
16.1639995574951, 15.6180000305175, 15.5109996795654, 15.5120000839233, 
15.628999710083, 16.9950008392333, 17.3530006408691, 17.2229995727539, 
16.8910007476806, 17.1800003051757, 17.1709995269775, 16.9860000610351, 
16.704999923706, 16.273000717163, 15.8830003738403, 15.6230001449584, 
15.333999633789, 15.3839998245239, 15.3870000839233, 17.1119995117187, 
17.6200008392333, 16.8349990844726, 16.4969997406005, 16.2479991912841, 
16.1259994506835, 15.8059997558593, 15.378999710083, 15.4320001602172, 
15.2100000381469, 15.2519998550415, 15.2150001525878, 15.4280004501342, 
17.4790000915527, 16.6739997863769, 16.4330005645751, -16.6299991607666, 
-16.9529991149902, -17.5610008239746, -17.8290004730224, -18.6200008392333, 
-17.1079998016357, -16.25, -21.716999053955, -21.1219997406005, 
-21.8209991455078, -20.1840000152587, -20.0450000762939, -20.9599990844726, 
-19.5240001678466, -18.6590003967285, -19.4379997253417, -18.6280002593994, 
-18.0669994354248, -16.204999923706, -15.5830001831054, -23.9489994049072, 
-23.57200050354, -24.3969993591308, -23.2880001068115, -22.6019992828369, 
-23.2329998016357, -22.5979995727539, -22.6140003204345, -20.8059997558593, 
-19.4300003051757, -19.4729995727539, -17.5690002441406, -16.8110008239746, 
-15.2930002212524, -25.2509994506835, -24.7649993896484, -24.8080005645751, 
-21.9939994812011, -21.5189990997314, -20.329999923706, -20.25, 
-19.1380004882812, -18.6180000305175, -18.5900001525878, -16.1620006561279, 
-14.5329999923706, -14.4359998703002, -25.8169994354248, -24.2159996032714, 
-22.57200050354, 190.996994018554, 190.996002197265, 190.18699645996, 
191.039993286132, 190.205993652343, 191.919006347656, 191.766006469726, 
187.14599609375, 186.889007568359, 186.225997924804, 188.60400390625, 
187.932006835937, 187.837005615234, 188.453002929687, 189.382995605468, 
189.360000610351, 191.25, 191.845001220703, 192.580001831054, 
192.414993286132, 185.358001708984, 184.570999145507, 184.595993041992, 
186.091995239257, 185.613998413085, 185.25, 186.235000610351, 
187.003005981445, 188.744995117187, 190.169998168945, 190.921005249023, 
192.628997802734, 192.768005371093, 193.281997680664, 184.602996826171, 
183.796005249023, 185.414001464843, 187.811004638671, 188.615005493164, 
189.263000488281, 190.167007446289, 191.781997680664, 191.837997436523, 
192.582000732421, 193.399002075195, 194.184005737304, 193.509994506835, 
183.776000976562, 186.173995971679, 187.774993896484), dim = c(50L, 
3L), dimnames = list(NULL, c("x", "y", "z")))

这似乎是函数在kkmeans()调用期间内部随机生成的东西的问题。我不知道";为什么";这种情况正在发生,您可能需要与作者核实,以确定这是一个错误还是预期行为。

虽然我用数据和代码重现了您的错误(每次都运行一个新的R实例(,但完全相同的函数调用有时也会产生其他错误,有时不会产生错误。然而,当set.seed()时,它是否这样做是完全可复制的,这表明它与决定模型其他参数的起始值有关。

下面我展示了(a(这可能会产生另一个错误(实际上,我看到了第三个错误,但没有保存种子来繁殖它(,(b(即使它";收敛;仅基于随机种子,它就产生了非常不同的聚类,并且(c(超参数调整在很大程度上受到随机数种子的影响。我忘了保存种子,以便在运行时使用10个集群获得一些集群结果。

我不知道为什么会发生这种情况:我的直觉是,在某些情况下,自动生成的设置是荒谬的/越界的,这会产生错误。这可能是因为你的数据在某种程度上很奇怪,也可能是因为设置超参数的算法没有多大意义。它也可能是一个bug,所以也许值得作为一个问题发布。

在任何情况下,要问自己的一个问题是,你是否想使用行为在产生结果时如此不一致的东西,在随机种子中产生非常不同的结果,并且你不知道算法是否真的在做它所说的事情,等等

示例1:clusters=5,无错误,set.seed(123)

set.seed(123)
#>  Hyperparameter : sigma =  0.463522505156128 
#> 
#> Centers:  
#>          [,1]      [,2]     [,3]
#> [1,] 16.53045 -21.18700 187.8918
#> [2,] 17.16138 -24.59687 184.7860
#> [3,] 15.73436 -17.87491 191.2586
#> [4,] 15.63425 -16.63862 192.0088
#> [5,] 16.19467 -20.16442 189.1617
#> 
#> Cluster size:  
#> [1] 11  8 11  8 12
#> 
#> Within-cluster sum of squares:  
#> [1] 537972.8 386310.2 544994.1 391965.9 604386.9

示例2:clusters=5,无错误,set.seed(3)

有效,但每个集群的观测数量非常不同!注意不同的超参数。

#>  Hyperparameter : sigma =  0.290281708176631 
#> 
#> Centers:  
#>          [,1]      [,2]     [,3]
#> [1,] 15.97636 -18.38464 190.5449
#> [2,] 16.24809 -20.10409 188.9572
#> [3,] 15.63660 -17.85633 191.5151
#> [4,] 17.06100 -22.70840 185.8834
#> [5,] 17.16138 -24.59687 184.7860
#> 
#> Cluster size:  
#> [1] 11 11 15  5  8
#> 
#> Within-cluster sum of squares:  
#> [1] 545547.7 538434.5 757947.0 236986.8 386310.2

示例3:clusters=5,无错误,set.seed(999)

有效,但每个集群的观测数量非常不同!再次注意不同的超参数!


#> Gaussian Radial Basis kernel function. 
#>  Hyperparameter : sigma =  0.128189488632645 
#> 
#> Centers:  
#>          [,1]      [,2]     [,3]
#> [1,] 16.93157 -22.25171 186.4579
#> [2,] 15.45090 -15.99500 192.8452
#> [3,] 15.73677 -18.32277 191.0152
#> [4,] 17.16244 -24.44533 184.8376
#> [5,] 16.32218 -20.69291 188.5965
#> 
#> Cluster size:  
#> [1]  7 10 13  9 11
#> 
#> Within-cluster sum of squares:  
#> [1] 294630.1 457490.3 604486.8 441669.5 539478.6

示例4:clusters = 10,新错误,set.seed(99)

新错误。

#> Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'affinMult' for signature '"rbfkernel", "numeric"'

示例5:clusters = 10,新错误,set.seed(3)

原始错误。

#> Error in if (sum(abs(dc)) < 1e-15) break: missing value where TRUE/FALSE needed

不包括:集群=10时的额外错误(未找到矩阵中的所有列(,并成功获得集群=10的一些集群。

最新更新