使用BayesSearchCV在scikit GaussianProcessRegressor中定义内核



问题:如何使用BayesSearchCV定义高斯过程回归器的核

我试图使用BayesSearchCVskopt来优化高斯过程模型中的超参数。似乎我定义的内核是错误的,得到了一个"TypeError":

TypeError: Cannot clone object ''rbf'' (type <class 'str'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' method.

伪代码:

from sklearn.datasets import make_regression
from sklearn.gaussian_process import GaussianProcessRegressor
from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer
from sklearn.gaussian_process.kernels import RBF, DotProduct, Matern
X,y = make_regression(100,10)
estimator = GaussianProcessRegressor()
param = {
'kernel': ['rbf','matern'],
'n_restarts_optimizer': (5,10),
'alpha': (1e-5, 1e-2,'log-uniform')
}
opt = BayesSearchCV(
estimator=estimator,
search_spaces=param,
cv=3,
scoring="r2",
random_state=42,
n_iter=3,
verbose=1,
)   
opt.fit(X, y)

首先,GPR似乎不支持字符串别名内核,至少在当前版本中是这样。然而,这引发了另一个问题,如果为kernel参数提供构造函数列表,则skopt无法处理它(不可处理类型(。据我所知,这仍然是一个长期存在的问题,尽管在问题页面的底部有一个拟议的解决方法。

另一种可能的解决方法是用特定的内核构建不同的基本估计器:

from sklearn.datasets import make_regression
from sklearn.gaussian_process import GaussianProcessRegressor
from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer
from sklearn.gaussian_process.kernels import RBF, DotProduct, Matern
from sklearn.pipeline import Pipeline
X,y = make_regression(100,10)
estimator_list = [GaussianProcessRegressor(kernel=RBF()),
GaussianProcessRegressor(kernel=Matern())]
pipe=Pipeline([('estimator',GaussianProcessRegressor())])
param = {
'estimator': Categorical(estimator_list),
'estimator__n_restarts_optimizer': (5,10),
'estimator__alpha': (1e-5, 1e-2,'log-uniform')
}
opt = BayesSearchCV(
estimator=pipe,
search_spaces=param,
cv=3,
scoring="r2",
random_state=42,
n_iter=3,
verbose=1,
)   
opt.fit(X, y)

最新更新