如何从 sklearn AgglomerativeClustering 遍历树

我有一个 numpy 文本文件数组：https://github.com/alvations/anythingyouwant/blob/master/WN_food.matrix

这是术语之间的距离矩阵，我的术语列表如下：http://pastebin.com/2xGt7Xjh

我使用以下代码生成了一个分层集群：

import numpy as np
from sklearn.cluster import AgglomerativeClustering
matrix = np.loadtxt('WN_food.matrix')
n_clusters = 518
model = AgglomerativeClustering(n_clusters=n_clusters,
                                linkage="average", affinity="cosine")
model.fit(matrix)

为了获得每个术语的集群，我可以做到：

for term, clusterid in enumerate(model.labels_):
    print term, clusterid

但是，如何遍历聚合聚类输出的树呢？

是否可以将其转换为树状图（http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.cluster.hierarchy.dendrogram.html）？之后，我如何遍历树状图？

我已经回答了类似的问题sklearn.cluster.ward_tree：您如何从sklearn.cluster.ward_tree中可视化病房树？

聚合聚类在 children_ 属性中以相同的方式输出树。下面是对聚集聚类的病房树问题中的代码的改编。它以树的每个节点的形式（node_id、left_child、right_child）输出树的结构。

import numpy as np
from sklearn.cluster import AgglomerativeClustering
import itertools
X = np.concatenate([np.random.randn(3, 10), np.random.randn(2, 10) + 100])
model = AgglomerativeClustering(linkage="average", affinity="cosine")
model.fit(X)
ii = itertools.count(X.shape[0])
[{'node_id': next(ii), 'left': x[0], 'right':x[1]} for x in model.children_]

https://stackoverflow.com/a/26152118

除了AP的答案之外，这里有一些代码可以给你一个成员资格字典。成员 [node_id] 给出所有数据点索引（0 到 n）。

on_split是 AP 集群的简单重新格式化，它给出了拆分node_id时形成的两个集群。

up_merge讲述了node_id合并到什么以及必须合并到什么node_id才能合并到其中。

ii = itertools.count(data_x.shape[0])
clusters = [{'node_id': next(ii), 'left': x[0], 'right':x[1]} for x in fit_cluster.children_]
import copy
n_points = data_x.shape[0]
members = {i:[i] for i in range(n_points)}
for cluster in clusters:
    node_id = cluster["node_id"]
    members[node_id] = copy.deepcopy(members[cluster["left"]])
    members[node_id].extend(copy.deepcopy(members[cluster["right"]]))
on_split = {c["node_id"]: [c["left"], c["right"]] for c in clusters}
up_merge = {c["left"]: {"into": c["node_id"], "with": c["right"]} for c in clusters}
up_merge.update({c["right"]: {"into": c["node_id"], "with": c["left"]} for c in clusters})

相关内容

最新更新

热门标签：