如何使用 R 中整洁模型中的调优模型预测测试集的置信区间?



我在R中使用tidymodels拟合随机森林模型,当我尝试使用调优模型预测测试集时发生错误:splits的每个元素必须是rsplit对象。

# Data splitting
data(Sacramento, package = "modeldata")
set.seed(123)
data_split <- initial_split(Sacramento, prop = 0.75, strata = price)
Sac_train <- training(data_split)
Sac_test <- testing(data_split)
# Build the model
rf_mod <- rand_forest(mtry = tune(), min_n = tune(), trees = 1000) %>% 
set_engine("ranger", importance = "permutation") %>% 
set_mode("regression")
# Create the recipe
Sac_recipe <- recipe(price ~ ., data = Sac_train) %>% 
step_rm(zip, latitude, longitude) %>% 
step_corr(all_numeric_predictors(), threshold = 0.85) %>% 
step_zv(all_numeric_predictors()) %>% 
step_normalize(all_numeric_predictors()) %>%
step_dummy(all_nominal_predictors())
# Create the workflow
rf_workflow <- workflow() %>% 
add_model(rf_mod) %>% 
add_recipe(Sac_recipe)
# Train and Tune the model
set.seed(123)
Sac_folds <- vfold_cv(Sac_train, v = 10, repeats = 2, strata = price)
rf_res <- rf_workflow %>% 
tune_grid(grid = 2*2,
resamples = Sac_folds, 
control = control_grid(save_pred = TRUE),
metrics = metric_set(rmse))
# Extract the best model
rf_best <- rf_res %>%
select_best(metric = "rmse")
# Last fit
last_rf_workflow <- rf_workflow %>% 
finalize_workflow(rf_best)
last_rf_fit <- last_rf_workflow %>% 
last_fit(Sac_train)
# Error: Each element of `splits` must be an `rsplit` object.
predict(last_rf_fit, Sac_test, type = "conf_int")

错误从以下几行生成,

last_rf_fit <- last_rf_workflow %>% 
last_fit(Sac_train)

现在从last_fit的文档中,

# S3 method for workflow
last_fit(object, split, ..., metrics = NULL, control = control_last_fit())

因此,workflow对象作为第一个参数通过%>%传递给last_fit,Sac_train传递给split参数。

但是从文档中,split参数需要是,

rsample::initial_split()创建的rsplit对象

所以,试试这个

last_rf_fit <- last_rf_workflow %>% 
last_fit(data_split)

然后收集预测,按照文档,

collect_predictions(last_rf_fit)

最新更新