管道上的GridSearchCV: best_params_似乎是错误的

我正在对由StandardScaler, SelectKBest & Lasso组成的管道进行gridSearch。我从下面的代码中得到的best_params_与最小化grid_scores_:

的参数组合不匹配。

numComponents=np.arange(20,220,20)
alphas=np.logspace(-6,0,15)
pipe = Pipeline(steps=[('normalize', StandardScaler()), ('selectK', SelectKBest(f_regression)),     ('lasso', Lasso())])
gsObj = gridCV(pipe, dict(selectK__k=numComponents.tolist(), lasso__alpha=alphas.tolist()),     scoring='mean_squared_error', cv=10, n_jobs=3, pre_dispatch=3)
gsObj.fit(X_train, y_train)
cvMse=np.array([-score[1] for score in gsObj.grid_scores_]).reshape(len(numComponents),     len(alphas))
optNumComponents=gsObj.best_params_['selectK__k']
optAlpha=gsObj.best_params_['lasso__alpha']

最低的cvMse出现在numComponents指数=5,alpha指数=7，而gsObj.best_params_ optNumComponents指数=2,optAlpha指数=9。

我是否错误地将grid_scores_重塑为:len(numComponents) x len(alphas)(因此假设分数是以这种方式排序的)?

引用文档，grid_scores_是scikit-learn 0.14.1中的命名元组列表，

包含param_grid中所有参数组合的分数。每个条目对应一个参数设置。每个命名元组都有以下属性:参数，参数设置的字典Mean_validation_score，交叉验证折叠的平均得分Cv_validation_scores，每个折叠的分数列表。

所以，不，您不应该依赖于列表中的任何顺序，相反，您应该检查项以找出它们包含的参数。您似乎假设参数是有序的，因为它们在您的代码中，但参数网格是一个dict，它不保证其键中的任何顺序。

相关内容

最新更新

热门标签：