是kmeans_lables或kmeans.predict，为数据集中的每个样本分配标签

在数据帧上应用kmeans托管后，我想将集群号添加到数据帧中的每个样本中。首先，我想知道kmeans.lables_和kmeans.predict((是否都返回相同的结果？。我都试过了，但我发现不匹配。我想知道该用哪一个？

from sklearn.cluster import KMeans 
kmeans5 = KMeans(n_clusters=5, max_iter=20, verbose=1) 
kmeans5.fit(wdf)
clusters5 = kmeans5.predict(wdf)

以同样的方式，我初始化了kmeans3和kmeans2，并获得了clusters3和clusters2，然后我尝试了这个

wdf2=wdf.copy()
wdf2['c5']=clusters5
wdf2['l5']=kmeans5.labels_
wdf2['c3']=clusters3
wdf2['l3']=kmeans3.labels_
wdf2['c2']=clusters2
wdf2['l2']=kmeans2.labels_

我在wdf2[['c5'，'5'，'c3'，'3'，'c2'，'2'].head(7(之后得到了以下内容，你可以看到c5和l5不匹配！。

id      c5  l5  c3  l3  c2  l2
40419   0   2   2   2   1   1
41060   3   0   2   2   1   1
43284   3   3   2   2   1   1
45664   3   1   0   0   1   1
52014   3   0   2   2   1   1
53488   3   1   0   0   1   1
53895   0   2   2   2   1   1

感谢

它们在训练数据集上应该是一致的。

来自文档：

"；此外，估计器将在最后一次迭代后重新分配标签_，以使标签_与训练集上的预测一致">

相关内容

最新更新

热门标签：