获得k均值聚类的聚类内距离



有没有一种方法可以获得集群内的距离,从每个点到集群中心,或者每个点到群内其他每个点?

我们有505个数据条目(患者(,每个条目都有63个特征(21个X、Y和Z坐标(。这些被分为4个集群,我的目标是找到每个集群的上下边界。

下面的代码应该可以得到从每个点到聚类中心的距离,但我想知道是否有一种方法可以得到每个点到另一个点的距离。我知道它可能会计算量很大。

for label in range(num_clusters):
dist = []

for patient in range(XYZ_df[XYZ_df['kmeans_labels'] == label].shape[0]):
dist.append(np.linalg.norm(XYZ_df[XYZ_df['kmeans_labels'] == label].iloc[patient,:-1] - kmeans.cluster_centers_[label]))

if label == 0:
dist_0 = dist.copy()
elif label == 1:
dist_1 = dist.copy()
elif label == 2:
dist_2 = dist.copy()
else:
dist_3 = dist.copy()

我想我使用scipy pdist函数和itertools找到了问题的答案。

import itertools
from scipy.spatial.distance import pdist
# Calculate farthest Euclidian distance between all the rows in cluster
data_0 = cluster_0.iloc[:,:-1]  # remove label column
d_0 = pd.DataFrame(itertools.combinations(data_0.index, 2), columns=['i','j'])
d_0['dist'] = pdist(data_0, 'euclid')
data_1 = cluster_1.iloc[:,:-1]
d_1 = pd.DataFrame(itertools.combinations(data_1.index, 2), columns=['i','j'])
d_1['dist'] = pdist(data_1, 'euclid')
data_2 = cluster_2.iloc[:,:-1]
d_2 = pd.DataFrame(itertools.combinations(data_2.index, 2), columns=['i','j'])
d_2['dist'] = pdist(data_2, 'euclid')
data_3 = cluster_3.iloc[:,:-1]
d_3 = pd.DataFrame(itertools.combinations(data_3.index, 2), columns=['i','j'])
d_3['dist'] = pdist(data_3, 'euclid')

最新更新