scikit-learn中如何利用GridSearchCV对嵌套管道的参数进行调优



是否可以在scikit-learn中调整嵌套管道的参数?例如:

svm = Pipeline([
    ('chi2', SelectKBest(chi2)),
    ('cls', LinearSVC(class_weight='auto'))
])
classifier = Pipeline([
    ('vectorizer', TfIdfVectorizer()),
    ('ova_svm', OneVsRestClassifier(svm))
})
parameters = ?
GridSearchCV(classifier, parameters)

如果不可能直接做到这一点,有什么变通办法?

scikit-learn对此有一个双下划线符号,如下所示。它递归地工作并扩展到OneVsRestClassifier,需要注意的是底层估计器必须显式地寻址为__estimator:

parameters = {'ova_svm__estimator__cls__C': [1, 10, 100],
              'ova_svm__estimator__chi2_k': [200, 500, 1000]}

对于您已经创建的估算器,您可以获得参数列表及其标记,如下所示。

import pprint as pp
pp.pprint(sorted(classifier.get_params().keys()))

[‘ova_svm’,‘ova_svm__estimator’,‘ova_svm__estimator__chi2’,"ova_svm__estimator__chi2__k","ova_svm__estimator__chi2__score_func"、"ova_svm__estimator__cls’,"ova_svm__estimator__cls__C","ova_svm__estimator__cls__class_weight","ova_svm__estimator__cls__dual","ova_svm__estimator__cls__fit_intercept","ova_svm__estimator__cls__intercept_scaling","ova_svm__estimator__cls__loss"、"ova_svm__estimator__cls__max_iter’,"ova_svm__estimator__cls__multi_class","ova_svm__estimator__cls__penalty","ova_svm__estimator__cls__random_state","ova_svm__estimator__cls__tol"、"ova_svm__estimator__cls__verbose’,"ova_svm__estimator__steps"、"ova_svm__n_jobs","步骤",'vectorizer', 'vectorizer__analyzer', 'vectorizer__binary',"vectorizer__decode_error"、"vectorizer__dtype’,"vectorizer__encoding"、"vectorizer__input’,"vectorizer__lowercase"、"vectorizer__max_df’,"vectorizer__max_features"、"vectorizer__min_df’,"vectorizer__ngram_range"、"vectorizer__norm’,"vectorizer__preprocessor"、"vectorizer__smooth_idf’,"vectorizer__stop_words"、"vectorizer__strip_accents’,"vectorizer__sublinear_tf"、"vectorizer__token_pattern’,"vectorizer__tokenizer"、"vectorizer__use_idf’,' vectorizer__vocabulary ']

从这个列表中,你可以设置你想要做GridSearchCV的参数。

相关内容

  • 没有找到相关文章

最新更新