自定义回归器:GridSearchCV 表示从 BaseEsitmator 继承时'get_params'不实现



你好

感谢您花时间看这篇文章

我正在努力实现这篇博客文章的scikit-learn API版本,数据可以在这里获得。我的自定义类复制了作者的结果,但不适用于GridSearchCV。

从本质上讲,他对一些光谱数据进行了偏最小二乘回归,其中最优分量数被确定为产生最低MSE的分量数。我的尝试如下所示,我能够复制作者的MSE结果以进行最佳校准,并且下面__init__的默认参数设置为这些参数。请注意,我是从BaseEstiamtorRegressorMixin继承的。

#download the .csv from the github repo from the blog post
#Creating df, shuffling, then creating `X` and `y`
df = pd.read_csv("nirpyresearch/data/peach_spectra+brixvalues.csv")
df = df.sample(replace=False, frac=1).copy()
y = df['Brix'].values
X = df[[i for i in list(df.columns) if 'wl' in i]].values
class SavgolPLS(BaseEstimator, RegressorMixin):
"""My Regressor"""
def __init__(self,  savgol_window = 17, savgol_polyorder = 2, savgol_deriv = 2, pls_components = 7 ):
self.savgol_window = savgol_window
self.savgol_polyorder = savgol_polyorder
self.savgol_deriv = savgol_deriv
self.pls_components = pls_components
def fit(self, X, y):
# Check that X and y have correct shape
X, y = check_X_y(X, y)

self.X_ = X
self.y_ = y
self.X_savgol_ = savgol_filter(X, self.savgol_window, self.savgol_polyorder, self.savgol_deriv)
self.pls_ = PLSRegression(n_components=self.pls_components).fit(self.X_savgol_, self.y_)
# Return the classifier
return self
def predict(self, X, apply_savgol = True):
# Check is fit had been called
#check_is_fitted(self)
# Input validation
X = check_array(X)
if apply_savgol:
X = savgol_filter(X, self.savgol_window, self.savgol_polyorder, self.savgol_deriv)
pred_y = self.pls_.predict(X)
return pred_y
def score(self, y_pred):
mse = mean_squared_error( y_true = self.y_, y_pred=y_pred,)
return mse

我现在可以初始化模型,并使用.get_params()来获得包含__init__中的4个参数的dict。

s_pls = SavgolPLS(pls_components=7)
s_pls.get_params()

因此,get_params()似乎是存在的。这是有道理的,因为它是从BaseEstimator继承的。我还可以使用fit()方法来复制作者的结果。

s_pls = s_pls.fit(X = X, y = y)
y_pred = s_pls.predict(X)
#This should be ~0.6566
s_pls.score(y_pred)

那么,为什么在下面的代码中应用GridSearchCV会产生所示的错误呢?

parameters  ={'savgol_window':[3,30], 'savgol_polyorder':[2,4], 'savgol_deriv':[1,3], 'pls_components':[2,15]}
clf = GridSearchCV(SavgolPLS, parameters, cv = 10)
clf.fit(X, y)

产生

TypeError                                 Traceback (most recent call last)
<ipython-input-22-e20c1eabb4fa> in <module>
----> 1 clf.fit(X, y.ravel())
C:toolsAnaconda3envsdev_py37_tflibsite-packagessklearnmodel_selection_search.py in fit(self, X, y, groups, **fit_params)
631         n_splits = cv.get_n_splits(X, y, groups)
632 
--> 633         base_estimator = clone(self.estimator)
634 
635         parallel = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
C:toolsAnaconda3envsdev_py37_tflibsite-packagessklearnbase.py in clone(estimator, safe)
58                             "it does not seem to be a scikit-learn estimator "
59                             "as it does not implement a 'get_params' methods."
---> 60                             % (repr(estimator), type(estimator)))
61     klass = estimator.__class__
62     new_object_params = estimator.get_params(deep=False)
TypeError: Cannot clone object '<class '__main__.SavgolPLS'>' (type <class 'type'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.

谢谢你的帮助

如果要将一个类传递给GridSearchCV,则应该传递一个实例:clf = GridSearchCV(SavgolPLS(), parameters, cv = 10)

最新更新