Scikit Learn -用GridSearchCV训练新模型



如果我使用GridSearchCV和管道获得最佳参数,是否有保存训练模型的方法,以便将来我可以将整个管道调用到新数据并为其生成预测?例如,我有以下管道,后面跟着参数的gridsearchcv:

pipeline = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', OneVsRestClassifier(SVC(probability=True))),
])
parameters = {
    'vect__ngram_range': ((1, 1),(1, 2),(1,3)),  # unigrams or bigrams
    'clf__estimator__kernel': ('rbf','linear'),
    'clf__estimator__C': tuple([10**i for i in range(-10,11)]),
}
grid_search = GridSearchCV(pipeline,parameters,n_jobs=-1,verbose=1)
print("Performing grid search...")
print("pipeline:", [name for name, _ in pipeline.steps])
print("parameters:")
pprint(parameters)
t0 = time()
#Conduct the grid search
grid_search.fit(X,y)
print("done in %0.3fs" % (time() - t0))
print()
print("Best score: %0.3f" % grid_search.best_score_)
print("Best parameters set:")
#Obtain the top performing parameters
best_parameters = grid_search.best_estimator_.get_params()
#Print the results
for param_name in sorted(parameters.keys()):
    print("t%s: %r" % (param_name, best_parameters[param_name]))

现在我想把所有这些步骤保存到一个单一的流,这样我就可以把它应用到一个新的,看不见的数据集,它将使用相同的参数,矢量器和转换器来转换,实现和报告结果吗?

您可以pickle GridSearchCV对象以保存它,然后当您想使用它来预测新数据时取消pickle。

import pickle
# Fit model and pickle fitted model
grid_search.fit(X,y)
with open('/model/path/model_pickle_file', "w") as fp:
    pickle.dump(grid_search, fp)
# Load model from file
with open('/model/path/model_pickle_file', "r") as fp:
    grid_search_load = pickle.load(fp)
# Predict new data with model loaded from disk
y_new = grid_search_load.best_estimator_.predict(X_new)

相关内容

  • 没有找到相关文章

最新更新