scikit-learn GridSearchCV best_score_是如何计算的?

我一直在试图弄清楚GridSearchCV的bestrongcore_参数是如何计算的(或者换句话说，它是什么意思)。文档说:

best_estimator对遗漏数据的评分。

所以，我试着把它翻译成我理解的东西，并计算了实际"y"的r2_score和每个kfold的预测ys -并得到了不同的结果(使用这段代码):

test_pred = np.zeros(y.shape) * np.nan 
for train_ind, test_ind in kfold:
    clf.best_estimator_.fit(X[train_ind, :], y[train_ind])
    test_pred[test_ind] = clf.best_estimator_.predict(X[test_ind])
r2_test = r2_score(y, test_pred)

我到处寻找关于最佳分数的更有意义的解释，但没有找到任何东西。有人愿意解释一下吗?

谢谢

这是最佳估计器的平均交叉验证分数。让我们制作一些数据并修复交叉验证的数据划分。

>>> y = linspace(-5, 5, 200)
>>> X = (y + np.random.randn(200)).reshape(-1, 1)
>>> threefold = list(KFold(len(y)))

现在运行cross_val_score和GridSearchCV，都有这些固定的折叠。

>>> cross_val_score(LinearRegression(), X, y, cv=threefold)
array([-0.86060164,  0.2035956 , -0.81309259])
>>> gs = GridSearchCV(LinearRegression(), {}, cv=threefold, verbose=3).fit(X, y) 
Fitting 3 folds for each of 1 candidates, totalling 3 fits
[CV]  ................................................................
[CV] ...................................... , score=-0.860602 -   0.0s
[Parallel(n_jobs=1)]: Done   1 jobs       | elapsed:    0.0s
[CV]  ................................................................
[CV] ....................................... , score=0.203596 -   0.0s
[CV]  ................................................................
[CV] ...................................... , score=-0.813093 -   0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.0s finished

注意GridSearchCV输出中的score=-0.860602、score=0.203596和score=-0.813093;与cross_val_score返回的值完全一致。

请注意，"平均值"实际上是折叠的宏观平均值。iid参数到GridSearchCV可以用来获得样品的微平均值。

相关内容

最新更新

热门标签：