我正试图在二进制分类任务中实现K-Means算法,但我无法绘制得到的两个聚类的散点图。
我的数据集只是以下形式:
# size, class
312, 1
319 1
227 0
最小的例子:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.cluster import KMeans
X = {'size': [312,319,227,301,273,311,277,291,303,381], 'class': [1,1,0,1,0,1,0,0,1,1]}
X = pd.DataFrame(data=X)
X_train, X_test, y_train, y_test = train_test_split(X['size'], X['class'], test_size=0.4)
X_train = X_train.values.reshape(-1,1)
X_test = X_test.values.reshape(-1,1)
kmeans = KMeans(init="k-means++", n_clusters=2, n_init=10, max_iter=300, random_state=42)
kmeans.fit(X_train)
preds = kmeans.predict(X_test)
我如何绘制显示两个聚类的散点图;X_ test";以及根据预测的对应颜色(对于0和1("0";preds";?
由于您只有一个功能,因此所有数据都在一行中。你可以创建这样的散点图:
color = ["blue", "red"]
plt.scatter(X_test.flatten(), [0]*len(X_test), c=[color[p] for p in preds])
如果你想有两个功能,你可以修改你的数据:
X = {
'size_1': [312,319,227,301,273,311,277,291,303,381],
'size_2': [152,165,301,145,310,145,315,156,160,165],
'class': [1,1,0,1,0,1,0,0,1,1],
}
X = pd.DataFrame(data=X)
X_train, X_test, y_train, y_test = train_test_split(X[['size_1', 'size_2']], X['class'], test_size=0.4)
然后修改散点图:
plt.scatter(X_test.iloc[:,0],X_test.iloc[:,1], c=[color[p] for p in preds])