GridSearchCV scoring and grid_scores_



我正在努力了解如何获得GridSearchCV的记分器值。下面的示例代码为文本数据设置了一个小管道。

然后,它在不同的ngrams上设置网格搜索。

通过f1度量进行评分:

#setup the pipeline
tfidf_vec = TfidfVectorizer(analyzer='word', min_df=0.05, max_df=0.95)
linearsvc = LinearSVC()
clf = Pipeline([('tfidf_vec', tfidf_vec), ('linearsvc', linearsvc)])
# setup the grid search
parameters = {'tfidf_vec__ngram_range': [(1, 1), (1, 2)]}
gs_clf = GridSearchCV(clf, parameters, n_jobs=-1, scoring='f1')
gs_clf = gs_clf.fit(docs_train, y_train)

现在我可以用打印分数

打印gs_clf.grid_scores _

[mean: 0.81548, std: 0.01324, params: {'tfidf_vec__ngram_range': (1, 1)},
 mean: 0.82143, std: 0.00538, params: {'tfidf_vec__ngram_range': (1, 2)}]

打印gs_clf.grid_scores[0].cv_validation_scores

array([ 0.83234714,  0.8       ,  0.81409002])

我从文件中不清楚:

  1. gs_clf.grid_scores[0].cv_validation_scores是一个数组,其分数通过评分参数定义,每倍数(在这种情况下,f1测量每倍数)?如果不是,那是什么?

  2. 如果我选择另一个度量,例如scoring='f1_micro',那么gs_clf.grid_scores[I].cv_validation_scores中的每个数组将包含特定网格搜索参数选择的折叠的f1_micro度量?

我编写了以下函数来将grid_scores_对象转换为pandas.DataFrame。希望数据帧视图将有助于消除您的困惑,因为它是一种更直观的格式:

def grid_scores_to_df(grid_scores):
    """
    Convert a sklearn.grid_search.GridSearchCV.grid_scores_ attribute to a tidy
    pandas DataFrame where each row is a hyperparameter-fold combinatination.
    """
    rows = list()
    for grid_score in grid_scores:
        for fold, score in enumerate(grid_score.cv_validation_scores):
            row = grid_score.parameters.copy()
            row['fold'] = fold
            row['score'] = score
            rows.append(row)
    df = pd.DataFrame(rows)
    return df

您必须具有以下导入才能工作:import pandas as pd

最新更新