这是我正在经历的K均值聚类示例的摘录。有人可以帮助我了解最后两行发生了什么吗?
具体说来:
class_of_points = compare_to_first_center > compare_to_second_center
在做什么?它只是返回一个布尔值吗?- 同样在下一行中,
colors_map[class_of_points + 1 - 1]
在做什么?
提前谢谢,伙计们。
import random # library for random number generation
import numpy as np # library for vectorized computation
import pandas as pd # library to process data as dataframes
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets.samples_generator import make_blobs
# data
x1 = [-4.9, -3.5, 0, -4.5, -3, -1, -1.2, -4.5, -1.5, -4.5, -1, -2, -2.5, -2, -1.5, 4, 1.8, 2, 2.5, 3, 4, 2.25, 1, 0, 1, 2.5, 5, 2.8, 2, 2]
x2 = [-3.5, -4, -3.5, -3, -2.9, -3, -2.6, -2.1, 0, -0.5, -0.8, -0.8, -1.5, -1.75, -1.75, 0, 0.8, 0.9, 1, 1, 1, 1.75, 2, 2.5, 2.5, 2.5, 2.5, 3, 6, 6.5]
#Define a function that updates the centroid of each cluster
colors_map = np.array(['b', 'r'])
def assign_members(x1, x2, centers):
compare_to_first_center = np.sqrt(np.square(np.array(x1) - centers[0][0]) + np.square(np.array(x2) - centers[0][1]))
compare_to_second_center = np.sqrt(np.square(np.array(x1) - centers[1][0]) + np.square(np.array(x2) - centers[1][1]))
class_of_points = compare_to_first_center > compare_to_second_center
colors = colors_map[class_of_points + 1 - 1]
return colors, class_of_points
>compare_to_first_center
是所有点到centers[0]
的距离,同样,compare_to_second_center
是所有点到centers[1]
的距离。现在,class_of_points
是一个与您的点大小相同的布尔数组,说明每个点是否更接近center[0]
或centers[1]
。如果class_of_points[i]
True
,则数据中的point[i]
更接近centers[0]
。
colors = colors_map[class_of_points + 1 - 1]
为点分配颜色b
或r
,b
它们是否更接近centers[1]
,并为centers[0]
分配r
。请注意,为了将布尔掩码class_of_points
转换为索引数组,它们加 1 并减去 1,以便输出将False
转换为0
和True
转换为 1,这使它们成为索引。一个例子是:
np.array([True, False, True])+1-1
与
[1, 0, 1]
或者,您可以简单地将其替换为:
colors = colors_map[class_of_points + 0]
似乎给出了两个中心列表。此代码将计算每个点到每个中心点的欧氏距离,并将蓝色分配给centers[0][:]
中靠近中心点的点,将红色分配给centers[1][:]
中靠近中心点的点。
def assign_members(x1, x2, centers):
# In the following two lines, the eucledean distances are being calculated
compare_to_first_center = np.sqrt(np.square(np.array(x1) - centers[0][0]) + np.square(np.array(x2) - centers[0][1]))
compare_to_second_center = np.sqrt(np.square(np.array(x1) - centers[1][0]) + np.square(np.array(x2) - centers[1][1]))
# Check which center is closer to each point
# So class_of_points is a binary arary specifies the class number
class_of_points = compare_to_first_center > compare_to_second_center
# Depending on the class number (i.e. 0 or 1) it chooses the colour (blue or red)
colors = colors_map[class_of_points + 1 - 1]
return colors, class_of_points