为什么GridSearchCV即使在安装后也没有best_estimator_？

我正在使用scikit-learn学习多类分类。我的目标是开发一个代码，试图包括评估分类所需的所有可能的指标。这是我的代码：

from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer, precision_score, recall_score, f1_score
param_grid = [
{'estimator__randomforestclassifier__n_estimators': [3, 10], 'estimator__randomforestclassifier__max_features': [2]},
#    {'estimator__randomforestclassifier__bootstrap': [False], 'estimator__randomforestclassifier__n_estimators': [3, 10], 'estimator__randomforestclassifier__max_features': [2, 3, 4]}
]
rf_classifier = OneVsRestClassifier(
make_pipeline(RandomForestClassifier(random_state=42))
)
scoring = {'accuracy': make_scorer(accuracy_score),
'precision_macro': make_scorer(precision_score, average = 'macro'),
'recall_macro': make_scorer(recall_score, average = 'macro'),
'f1_macro': make_scorer(f1_score, average = 'macro'),
'precision_micro': make_scorer(precision_score, average = 'micro'),
'recall_micro': make_scorer(recall_score, average = 'micro'),
'f1_micro': make_scorer(f1_score, average = 'micro'),
'f1_weighted': make_scorer(f1_score, average = 'weighted')}
grid_search = GridSearchCV(rf_classifier, param_grid=param_grid, cv=2, 
scoring=scoring, refit=False)
grid_search.fit(X_train_prepared, y_train)

然而，当我试图找出最佳估计器时，我得到了以下错误消息：

print(grid_search.best_params_)
print(grid_search.best_estimator_)
AttributeError: 'GridSearchCV' object has no attribute 'best_params_'

问题：即使在拟合模型后，我怎么可能得不到最佳估计量？我注意到，如果我设置refit="some_of_the_metrics"，我会得到一个估计器，但我不明白为什么要使用它，因为它适合优化度量的方法，而不是所有度量。因此，如何获得所有分数的最佳估计量？改装的意义何在？

注意：我试图阅读文档，但对我来说仍然没有意义。

改装的要点是，将使用之前找到的最佳参数集和整个数据集对模型进行改装。为了找到最佳参数，使用了交叉验证，这意味着数据集总是被分为训练集和验证集，即不是整个数据集都用于训练。

当你定义多个指标时，你必须告诉scikit学习它应该如何确定什么对你来说是最好的。为了方便起见，你可以指定任何一个记分员作为决胜局。在这种情况下，最大化该度量的参数集将用于重新装配。

如果你想要更复杂的东西，比如取返回所有得分者中最高平均值的参数集，你必须传递一个函数来重新调整，在给定所有创建的度量的情况下，返回相应最佳参数集的索引。然后，此参数集将用于重新装配模型。

这些度量将作为字符串作为键和NumPy数组作为值的字典传递。这些NumPy数组具有与已评估的参数集一样多的条目。你在里面发现了很多东西。最相关的可能是mean_test_*scorer-name*。这些数组包含每个测试参数集的平均得分手名称-在cv分割中计算的得分手。

在代码中，要获得返回所有记分器的最高平均值的参数集的索引，可以执行以下


def find_best_index(eval_results: dict[str, np.array]) -> int:
# returns a n-scorers x n-parameter-set dimensional array
means_of_splits = np.array(
[values for name, values in eval_results.items() if name.startswith('mean_test')]
)
# this is a n-parameter-set dimensional vector
mean_of_all_scores = np.mean(means_of_splits, axis=0) 
# get index of maximum value which corresponds to the best parameter set
return np.argmax(mean_of_all_scores) 

grid_search = GridSearchCV(
rf_classifier, param_grid=param_grid, cv=2, scoring=scoring, refit=find_best_index
)
grid_search.fit(X_train_prepared, y_train)

相关内容

最新更新

热门标签：