使用预先计算的聚类中心重建 k 均值

我正在使用 k 均值进行聚类，聚类数为 60。由于某些集群的意义不大，因此我从集群中心阵列(count = 8(中删除了这些集群中心并保存在clean_cluster_array中。

这一次，我用init = clean_cluster_centers重新拟合k-means模型，n_clusters = 52和max_iter = 1，因为我想尽可能避免重新拟合。

基本思想是使用 clean_cluster_centers 重新创建新模型。这里的问题是，我们正在删除大量集群;即使有n_iter = 1，该模型也可以快速配置到更稳定的中心。有没有办法重新创建 k 均值模型？

如果您已拟合 KMeans 对象，则该对象具有cluster_centers_属性。您可以通过执行以下操作直接更新它：

cls.cluster_centers_ = new_cluster_centers

因此，如果您想要一个具有干净集群中心的新对象，只需执行以下操作：

cls = KMeans().fit(X)
cls2 = cls.copy()
cls2.cluster_centers_ = new_cluster_centers

现在，由于预测函数仅检查对象是否具有名为 cluster_centers_ 的非空属性，因此您可以使用预测函数

def predict(self, X):
    """Predict the closest cluster each sample in X belongs to.
    In the vector quantization literature, `cluster_centers_` is called
    the code book and each value returned by `predict` is the index of
    the closest code in the code book.
    Parameters
    ----------
    X : {array-like, sparse matrix}, shape = [n_samples, n_features]
        New data to predict.
    Returns
    -------
    labels : array, shape [n_samples,]
        Index of the cluster each sample belongs to.
    """
    check_is_fitted(self, 'cluster_centers_')
    X = self._check_test_data(X)
    x_squared_norms = row_norms(X, squared=True)
    return _labels_inertia(X, x_squared_norms, self.cluster_centers_)[0]

相关内容

最新更新

热门标签：