Pyplot不能绘制Regression

我试图模仿非常简单的例子

N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = np.pi * (15 * np.random.rand(N))**2  # 0 to 15 point radiuses
print(type(x),type(y))
print('training samples ',len(x),len(y))
plt.scatter(x, y, c=colors, alpha=0.5)
plt.show()

这个节目

<class 'numpy.ndarray'> <class 'numpy.ndarray'>
training samples  50 50

和预期的一样，情节也出现了。现在我试着把GradientBoostingRegressor的结果画成

base_regressor = GradientBoostingRegressor()
base_regressor.fit(X_train, y_train)
y_pred_base = base_regressor.predict(X_test)
print(type(X_train),type(y_train))
print('training samples ',len(X_train),len(y_train))
print(type(X_test),type(y_pred_base))
print('base samples ',len(X_test),len(y_pred_base))
plt.figure()
plt.scatter(X_train, y_train, c="k", label="training samples")
plt.plot(X_test, y_pred_base, c="g", label="n_estimators=1", linewidth=2)
plt.xlabel("data")
plt.ylabel("target")
plt.title("Base Regression")
plt.legend()
plt.show()

注意，X_train, y_train, X_test都是numpy数组。对于上面的代码，我得到

<class 'numpy.ndarray'> <class 'numpy.ndarray'>
training samples  74067 74067
<class 'numpy.ndarray'> <class 'numpy.ndarray'>
base samples  166693 166693

但是没有显示，我得到错误

ValueError: x and y must be the same size

在

plt.scatter(X_train, y_train, c="k", label="training samples")

，但从输出中可以看出，x和y的大小和类型相同。我做错了什么?

X_train数组是二维的，每个样本有163列。您不能根据整个X_train数组绘制y_train数组，因为它只是一维的。类似地，对于X_test的y_pred_base图，

您必须在X数组中选择要绘制的列之一，像这样编辑您的代码:

plt.scatter(X_train[:, 17], y_train, c="k", label="training samples")
plt.plot(X_test[:, 17], y_pred_base, c="g", label="n_estimators=1", linewidth=2)

你的因变量(X)生活在163维空间中。每个y值都依赖于这些维度对应的x值。简单的二维散点图或线形图无法一次显示所有这些信息。

你可以做的一件事是找出你的y值最依赖于哪个x变量。您可以使用base_regressor.feature_importances_属性访问它。这里的文档中有一个示例。然后你就可以密谋对付最重要的人。你可以使用3D散点图来实现多维度或者在更高的维度上使用corner。py

相关内容

最新更新

热门标签：