我如何确保GridSearchCV首先交叉分割,然后输入?



我有一个GridSearchCV,其管道看起来像这样:

numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='most_frequent')),
('scaler', StandardScaler())
])

preprocessor = ColumnTransformer(transformers=[
('num', numeric_transformer, numeric_features),
])
clf = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', LogisticRegression(solver='lbfgs'))
])  

我的GridSearchCV是这样的:

search = GridSearchCV(clf, param_grid, cv = 5, scoring = "roc_auc",error_score=0.0)

with Cross Validation = 5

那么,我如何确保我先拆分数据,然后在最频繁的情况下进行估算呢?

GridSearchCV大致如下:

for train_index, val_index in StratifiedKFold(n_splits=5).split(X, y):
X_train, X_val = X[train_index], X[val_index]
y_train, y_val = y[train_index], y[val_index]
clf = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', LogisticRegression(solver='lbfgs'))
]) 
clf.fit(X_train, y_train)
clf.predict(X_val, y_val)

您可以确定SimpleImputerStandardScaler将对.fit().transform()进行每一次折叠。

相关内容

  • 没有找到相关文章

最新更新