在使用scikit管道运行GridSearchCV后未能保存模型



我有下面的玩具示例来复制这个问题

import numpy as np
import xgboost as xgb
from sklearn.datasets import make_regression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
X, y = make_regression(n_samples=30, n_features=5, noise=0.2)
reg = xgb.XGBRegressor(tree_method='hist', eval_metric='mae', n_jobs= 4)
steps = list()
steps.append(('reg', reg))
pipeline = Pipeline(steps=steps)
param_grid = {'reg__max_depth': [2, 4, 6],}
cv = 3
model = GridSearchCV(pipeline, param_grid, cv=cv, scoring='neg_mean_absolute_error')
best_model = model.fit(X = X, y = y)

则以下四种方法无法保存拟合模型:

model.save_model('test_1.json')
# AttributeError: 'GridSearchCV' object has no attribute 'save_model'
best_model.save_model('test2.json')
# AttributeError: 'GridSearchCV' object has no attribute 'save_model'
best_model.best_estimator_.save_model('test3.json')
# AttributeError: 'Pipeline' object has no attribute 'save_model'
model.best_estimator_.save_model('test4.json')
# AttributeError: 'Pipeline' object has no attribute 'save_model'

但这两种方法都有效。

import joblib
joblib.dump(model.best_estimator_, 'naive_model.joblib')
joblib.dump(best_model.best_estimator_, 'naive_best_model.joblib')

谁能告诉我,如果它是我构造我的管道错误地打破了保存最佳模型的方法的方式?

Only "xgboost"对象有一个属性"save_model"。当你使用gridsearch的时候,它已经是一个不同的对象,包裹着"xgboost"管道也是一样。你需要做model.best_estimator_['reg'].save_model。但是它将只保存xgboost,而不需要从管道进行任何数据转换。"joblib"one_answers";pickle"是更通用的解决方案,imho

最新更新