如何在层次结构聚类中跟踪特定项目的存在

我有一个关于层次聚类的问题。我有一个相对复杂的数据集，有2000个项目/样本。我使用scipy对项目进行聚类，并给出不同的聚类截止值，例如0.1-0.9

from scipy.cluster import hierarchy as hac
Z=hac.linkage(distance, single,'euclidean')
results=hac.fcluster(Z, cutoff,'distance')

我如何检查/跟踪某个项目，比如当x组的截止值为0.1时，y组的截止点为0.2时，等等

我考虑过展示树状图，但从树状图中跟踪2000个样本中的1个项目会太混乱吗？

尝试使用set(list(..))构建一组集群ID以删除重复项，然后遍历元素并根据它们所属的集群过滤数据。试试看，因为你没有给出数据样本来测试它

你的代码看起来像：

clusterIDs = set(list(results))
D= {} # Dictinary where you store ClusterID: [list of points that belong to that cluster]
for i, clusterID in enumerate(clusterIDs):
  clusterItems = data[np.where(results == clusterID)]
  D[clusterID]=clusterItems

相关内容

最新更新

热门标签：