我在python中实现了kmeans算法,并试图计算k的各种值的集群轮廓性能。以下是数据集一小部分的几个变量。
def avgdist(pt, clust):
dists = []
for elem in clust:
dists.append(np.linalg.norm(pt-elem))
return np.mean(dists)
def silhouette(data, clusts):
s = []
print("data-")
print(data)
for i in range(len(clusts)):
for j in range(len(clusts[i])):
clusts[i][j] = clusts[i][j].tolist()
print("Clusters")
print(clusts)
for elem in data:
a = []
b = []
print(elem)
for clust in clusts:
print(clust)
if elem in clust: #Error in this line
b.append(avgdist(elem, clust))
else:
a.append(avgdist(elem, clust))
s.append((min(b)-min(a)/(max(min(b), min(a)))))
return np.mean(s)
获得的终端输出如下-
data-
[[ 0. 0. 5.]
[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 7.]
[ 0. 0. 0.]
[ 0. 0. 12.]
[ 0. 0. 0.]
[ 0. 0. 7.]
[ 0. 0. 9.]
[ 0. 0. 11.]]
Clusters
[[array([ 0., 0., 5.]), array([ 0., 0., 0.]), array([ 0., 0., 0.]), array([ 0., 0., 0.]), array([ 0., 0., 0.])], [array([ 0., 0., 7.]), array([ 0., 0., 12.]), array([ 0., 0., 7.]), array([ 0., 0., 9.]), array([ 0., 0., 11.])]]
[ 0. 0. 5.]
[[0.0, 0.0, 5.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
这是与注释行中的错误一起获得的-
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
请帮忙,因为我不确定该错误在我的上下文中意味着什么。类似的问题让我对错误性质有所了解,但我认为在这里不适用。
编辑 - 我通过将错误行更改为 - 解决了这个问题-
.....
if elem.tolist() in clust: #Error in this line
.....
您的问题是,如果列表列表(clust)包含另一个列表(elem),则尝试在相关行进行评估,这会导致Truth/False值的列表/数组,因为评估是按元素完成的:有问题的代码行将沿以下行进行评估
if [True, False, ...]: #<- error here
code
这将产生有问题的错误
与其保存列表列表,不如将数据和聚类元素转换/打包为元组列表,此评估将起作用。
假设你有
import numpy as np
data = np.array([[ 0., 0., 5.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 7.],
[ 0., 0., 0.],
[ 0., 0., 12.],
[ 0., 0., 0.],
[ 0., 0., 7.],
[ 0., 0., 9.],
[ 0., 0., 11.]])
clusts = [[np.array([ 0., 0., 5.]), np.array([ 0., 0., 0.]), np.array([ 0., 0., 0.]), np.array([ 0., 0., 0.]), np.array([ 0., 0., 0.])], [np.array([ 0., 0., 7.]), np.array([ 0., 0., 12.]), np.array([ 0., 0., 7.]), np.array([ 0., 0., 9.]), np.array([ 0., 0., 11.])]]
更换呢
[...]
if elem in clust: #Error in this line
[...]
由
[...]
if any([compa.all() for compa in elem == clust]):
[...]
测试此类数组列表中是否存在一个 numpy 数组。
在 Python 3.6 下测试
总结
def silhouette(data, clusts):
s = []
print("data-")
print(data)
for i in range(len(clusts)):
for j in range(len(clusts[i])):
clusts[i][j] = clusts[i][j].tolist()
print("Clusters")
print(clusts)
for elem in data:
a = []
b = []
print(elem)
for clust in clusts:
print(clust)
condition = any([compa.all() for compa in elem == clust])
print(condition)
if condition: #No error anymore in this line
b.append(avgdist(elem, clust))
else:
a.append(avgdist(elem, clust))
s.append((min(b)-min(a)/(max(min(b), min(a)))))
return np.mean(s)
将打印(仅报告一个比较)
[...]
[ 0. 0. 5.]
[[0.0, 0.0, 5.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
True
[[0.0, 0.0, 7.0], [0.0, 0.0, 12.0], [0.0, 0.0, 7.0], [0.0, 0.0, 9.0], [0.0, 0.0, 11.0]]
False