在二维空间上绘制聚类分析的结果

我不确定这是否可以追溯，所以我不确定我的问题是否有意义。我在20维空间中使用了k均值，得到了17个聚类。

因此，我获得了一个包含中心坐标df_center的数据帧(只是一个例子(：

cluster   x1   x2   x3  ...    x20
0     0.2  0.1 -0.1  ...  -0.1
1     ...  ...  ...  ...   ...
16     ...  ...  ...  ...   ...

我还有一个数据帧，其中包含点的坐标和它们所属的簇，df_points(只是一个例子(：

id_point  x1  x2  x3 ...  x20  cluster
0      ..  ..  .. ...  ...     0
1      ..  ..  .. ...  ...    12
2      ..  ..  .. ...  ...     6

我想用UMAP或其他工具在二维空间中表示这些数据。例如，黑色的中心和其他颜色不同的点根据它们所属的簇。

从这两个数据帧开始，有可能做到这一点吗？

是的，可以使用UMAP或其他降维技术在二维空间上绘制聚类分析的结果。但是，您需要将相同的技术应用于中心和点，然后在相同的轴上绘制它们。

例如，在Python中使用UMAP，可以执行以下操作：

# Import libraries
import pandas as pd
import umap
import matplotlib.pyplot as plt
# Load data frames
df_center = pd.read_csv("df_center.csv")
df_points = pd.read_csv("df_points.csv")
# Extract the coordinates of the centers and the points
X_center = df_center.drop("cluster", axis=1)
X_points = df_points.drop(["id_point", "cluster"], axis=1)
# Apply UMAP to both data sets, using the same parameters
umap_model = umap.UMAP(n_components=2, random_state=42)
X_center_2d = umap_model.fit_transform(X_center)
X_points_2d = umap_model.transform(X_points)
# Add the UMAP coordinates to the data frames
df_center["umap_x"] = X_center_2d[:, 0]
df_center["umap_y"] = X_center_2d[:, 1]
df_points["umap_x"] = X_points_2d[:, 0]
df_points["umap_y"] = X_points_2d[:, 1]
# Plot the centers and the points on the same axes, using different colors and markers
plt.figure(figsize=(10, 10))
plt.scatter(df_points["umap_x"], df_points["umap_y"], c=df_points["cluster"], cmap="tab20", alpha=0.5)
plt.scatter(df_center["umap_x"], df_center["umap_y"], c="black", marker="x", s=100, label="Centers")
plt.legend()
plt.xlabel("UMAP 1")
plt.ylabel("UMAP 2")
plt.title("Cluster analysis on a two-dimensional space")
plt.show()

相关内容

最新更新

热门标签：