r语言 - 随机森林与带有结果变量的tidymodels问题的分类



我正在使用tidymodels和ranger模拟随机森林模型的一个非常基本的版本。我只是想让它运行起来。我将稍后处理模型的有效性。当我运行模型时,我得到以下错误:

"独特的笔记:─────────────────────────────────────────────────────────────check_outcome()错误:! 对于分类模型,结果应该是一个因子。">

我很困惑,因为模拟的数据,显然,有结果作为一个因素。我错过了什么?在这种情况下,vfold会选择结果中不具有完整因子的部分吗?我怎么检查这个呢?

下面是我使用的代码:
library(tidyverse)
library(tidymodels)
library(themis)
library(dplyr)
library(vip)
library(forcats)
set.seed(987)
n = 1500
#simulated data
v1 = sample(c(0,1), size = n, replace = TRUE)
v2 = round(runif(n, 18, 80))
v3 = sample(c(0,1), size = n, replace = TRUE)
v4 = sample(c(0,1), size = n, replace = TRUE)
v5 = sample(c(0,1), size = n, replace = TRUE)
v6 = rbinom(n = n, size = 1, prob = .50)
xb = -9 + 3.5*v1 + 0.2*v2 + 0*v3 + 0*v4 +0*v5 + 0*v6  
p = 1/(1 + exp(-xb))
y1 = rbinom(n = n, size = 1, prob = p)
y1 = 
dplyr::recode_factor(y1, `1` = "yes", `2` =  "no")
dat_set = as.data.frame(cbind(v1, v2, v3, v4, v5, v6, y1))
dat_set |> 
count(y1)
class(y1)
#splitting into training and testing data
set.seed(123)
rf_split = initial_split(dat_set, strata = y1)
rf_train = training(rf_split)
rf_test = testing(rf_split)
#developing the model
set.seed(234)
rf_folds = vfold_cv(rf_train, strata = y1)
ranger_recipe = 
recipe(formula = y1 ~ ., data = rf_train)
ranger_spec = 
rand_forest(mtry = tune(), min_n = tune(), trees = 1000)  |>  
set_mode("classification")  |>  
set_engine("ranger", importance = "impurity") 
ranger_workflow = 
workflow() |> 
add_recipe(ranger_recipe)  |>  
add_model(ranger_spec) 
set.seed(93186)
ranger_tune =
tune_grid(ranger_workflow, 
resamples = rf_folds, 
control = control_resamples(save_pred = TRUE))

相对来说,我是R和整洁模型的新手。请指出我的错误,我愿意接受所有的建议。

问题是,当您使用cbind时,它将转换为只有一个类的矩阵。这意味着y1仍然是一个数值变量。你用class检查它,但这是y1仍然在cbind之前,所以你应该把它转换回这样的因子:

library(tidyverse)
library(tidymodels)
library(themis)
library(dplyr)
library(vip)
library(forcats)
set.seed(987)
n = 1500
#simulated data
v1 = sample(c(0,1), size = n, replace = TRUE)
v2 = round(runif(n, 18, 80))
v3 = sample(c(0,1), size = n, replace = TRUE)
v4 = sample(c(0,1), size = n, replace = TRUE)
v5 = sample(c(0,1), size = n, replace = TRUE)
v6 = rbinom(n = n, size = 1, prob = .50)
xb = -9 + 3.5*v1 + 0.2*v2 + 0*v3 + 0*v4 +0*v5 + 0*v6  
p = 1/(1 + exp(-xb))
y1 = rbinom(n = n, size = 1, prob = p)
y1 = 
dplyr::recode_factor(y1, `1` = "yes", `2` =  "no")
#> Warning: Unreplaced values treated as NA as `.x` is not compatible.
#> Please specify replacements exhaustively or supply `.default`.
dat_set = as.data.frame(cbind(v1, v2, v3, v4, v5, v6, y1))
# check type
str(dat_set)
#> 'data.frame':    1500 obs. of  7 variables:
#>  $ v1: num  0 0 0 0 1 0 1 0 0 0 ...
#>  $ v2: num  53 23 25 70 68 78 62 48 46 56 ...
#>  $ v3: num  1 0 1 0 1 1 1 1 1 0 ...
#>  $ v4: num  1 1 0 1 1 0 1 0 0 1 ...
#>  $ v5: num  1 0 0 0 1 1 0 0 1 1 ...
#>  $ v6: num  0 1 0 0 0 1 0 0 1 1 ...
#>  $ y1: num  1 NA NA 1 1 1 1 NA NA 1 ...
# Convert to factor
dat_set$y1 <- as.factor(dat_set$y1)
dat_set |> 
count(y1)
#>     y1    n
#> 1    1 1015
#> 2 <NA>  485
class(y1)
#> [1] "factor"
#splitting into training and testing data
set.seed(123)
rf_split = initial_split(dat_set, strata = y1)
rf_train = training(rf_split)
rf_test = testing(rf_split)
#developing the model
set.seed(234)
rf_folds = vfold_cv(rf_train, strata = y1)
ranger_recipe = 
recipe(formula = y1 ~ ., data = rf_train)
ranger_spec = 
rand_forest(mtry = tune(), min_n = tune(), trees = 1000)  |>  
set_mode("classification")  |>  
set_engine("ranger", importance = "impurity") 
ranger_workflow = 
workflow() |> 
add_recipe(ranger_recipe)  |>  
add_model(ranger_spec) 
set.seed(93186)
ranger_tune =
tune_grid(ranger_workflow, 
resamples = rf_folds, 
control = control_resamples(save_pred = TRUE))
#> i Creating pre-processing data to finalize unknown parameter: mtry
#> x Fold01: preprocessor 1/1, model 1/10: Error: Missing data in dependent variable.
#> x Fold01: preprocessor 1/1, model 2/10: Error: Missing data in dependent variable.
#> x Fold01: preprocessor 1/1, model 3/10: Error: Missing data in dependent variable.
#> x Fold01: preprocessor 1/1, model 4/10: Error: Missing data in dependent variable.
#> x Fold01: preprocessor 1/1, model 5/10: Error: Missing data in dependent variable.
#> x Fold01: preprocessor 1/1, model 6/10: Error: Missing data in dependent variable.
#> x Fold01: preprocessor 1/1, model 7/10: Error: Missing data in dependent variable.
#> x Fold01: preprocessor 1/1, model 8/10: Error: Missing data in dependent variable.
#> x Fold01: preprocessor 1/1, model 9/10: Error: Missing data in dependent variable.
#> x Fold01: preprocessor 1/1, model 10/10: Error: Missing data in dependent variable.
#> x Fold02: preprocessor 1/1, model 1/10: Error: Missing data in dependent variable.
#> x Fold02: preprocessor 1/1, model 2/10: Error: Missing data in dependent variable.
#> x Fold02: preprocessor 1/1, model 3/10: Error: Missing data in dependent variable.
#> x Fold02: preprocessor 1/1, model 4/10: Error: Missing data in dependent variable.
#> x Fold02: preprocessor 1/1, model 5/10: Error: Missing data in dependent variable.
#> x Fold02: preprocessor 1/1, model 6/10: Error: Missing data in dependent variable.
#> x Fold02: preprocessor 1/1, model 7/10: Error: Missing data in dependent variable.
#> x Fold02: preprocessor 1/1, model 8/10: Error: Missing data in dependent variable.
#> x Fold02: preprocessor 1/1, model 9/10: Error: Missing data in dependent variable.
#> x Fold02: preprocessor 1/1, model 10/10: Error: Missing data in dependent variable.
#> x Fold03: preprocessor 1/1, model 1/10: Error: Missing data in dependent variable.
#> x Fold03: preprocessor 1/1, model 2/10: Error: Missing data in dependent variable.
#> x Fold03: preprocessor 1/1, model 3/10: Error: Missing data in dependent variable.
#> x Fold03: preprocessor 1/1, model 4/10: Error: Missing data in dependent variable.
#> x Fold03: preprocessor 1/1, model 5/10: Error: Missing data in dependent variable.
#> x Fold03: preprocessor 1/1, model 6/10: Error: Missing data in dependent variable.
#> x Fold03: preprocessor 1/1, model 7/10: Error: Missing data in dependent variable.
#> x Fold03: preprocessor 1/1, model 8/10: Error: Missing data in dependent variable.
#> x Fold03: preprocessor 1/1, model 9/10: Error: Missing data in dependent variable.
#> x Fold03: preprocessor 1/1, model 10/10: Error: Missing data in dependent variable.
#> x Fold04: preprocessor 1/1, model 1/10: Error: Missing data in dependent variable.
#> x Fold04: preprocessor 1/1, model 2/10: Error: Missing data in dependent variable.
#> x Fold04: preprocessor 1/1, model 3/10: Error: Missing data in dependent variable.
#> x Fold04: preprocessor 1/1, model 4/10: Error: Missing data in dependent variable.
#> x Fold04: preprocessor 1/1, model 5/10: Error: Missing data in dependent variable.
#> x Fold04: preprocessor 1/1, model 6/10: Error: Missing data in dependent variable.
#> x Fold04: preprocessor 1/1, model 7/10: Error: Missing data in dependent variable.
#> x Fold04: preprocessor 1/1, model 8/10: Error: Missing data in dependent variable.
#> x Fold04: preprocessor 1/1, model 9/10: Error: Missing data in dependent variable.
#> x Fold04: preprocessor 1/1, model 10/10: Error: Missing data in dependent variable.
#> x Fold05: preprocessor 1/1, model 1/10: Error: Missing data in dependent variable.
#> x Fold05: preprocessor 1/1, model 2/10: Error: Missing data in dependent variable.
#> x Fold05: preprocessor 1/1, model 3/10: Error: Missing data in dependent variable.
#> x Fold05: preprocessor 1/1, model 4/10: Error: Missing data in dependent variable.
#> x Fold05: preprocessor 1/1, model 5/10: Error: Missing data in dependent variable.
#> x Fold05: preprocessor 1/1, model 6/10: Error: Missing data in dependent variable.
#> x Fold05: preprocessor 1/1, model 7/10: Error: Missing data in dependent variable.
#> x Fold05: preprocessor 1/1, model 8/10: Error: Missing data in dependent variable.
#> x Fold05: preprocessor 1/1, model 9/10: Error: Missing data in dependent variable.
#> x Fold05: preprocessor 1/1, model 10/10: Error: Missing data in dependent variable.
#> x Fold06: preprocessor 1/1, model 1/10: Error: Missing data in dependent variable.
#> x Fold06: preprocessor 1/1, model 2/10: Error: Missing data in dependent variable.
#> x Fold06: preprocessor 1/1, model 3/10: Error: Missing data in dependent variable.
#> x Fold06: preprocessor 1/1, model 4/10: Error: Missing data in dependent variable.
#> x Fold06: preprocessor 1/1, model 5/10: Error: Missing data in dependent variable.
#> x Fold06: preprocessor 1/1, model 6/10: Error: Missing data in dependent variable.
#> x Fold06: preprocessor 1/1, model 7/10: Error: Missing data in dependent variable.
#> x Fold06: preprocessor 1/1, model 8/10: Error: Missing data in dependent variable.
#> x Fold06: preprocessor 1/1, model 9/10: Error: Missing data in dependent variable.
#> x Fold06: preprocessor 1/1, model 10/10: Error: Missing data in dependent variable.
#> x Fold07: preprocessor 1/1, model 1/10: Error: Missing data in dependent variable.
#> x Fold07: preprocessor 1/1, model 2/10: Error: Missing data in dependent variable.
#> x Fold07: preprocessor 1/1, model 3/10: Error: Missing data in dependent variable.
#> x Fold07: preprocessor 1/1, model 4/10: Error: Missing data in dependent variable.
#> x Fold07: preprocessor 1/1, model 5/10: Error: Missing data in dependent variable.
#> x Fold07: preprocessor 1/1, model 6/10: Error: Missing data in dependent variable.
#> x Fold07: preprocessor 1/1, model 7/10: Error: Missing data in dependent variable.
#> x Fold07: preprocessor 1/1, model 8/10: Error: Missing data in dependent variable.
#> x Fold07: preprocessor 1/1, model 9/10: Error: Missing data in dependent variable.
#> x Fold07: preprocessor 1/1, model 10/10: Error: Missing data in dependent variable.
#> x Fold08: preprocessor 1/1, model 1/10: Error: Missing data in dependent variable.
#> x Fold08: preprocessor 1/1, model 2/10: Error: Missing data in dependent variable.
#> x Fold08: preprocessor 1/1, model 3/10: Error: Missing data in dependent variable.
#> x Fold08: preprocessor 1/1, model 4/10: Error: Missing data in dependent variable.
#> x Fold08: preprocessor 1/1, model 5/10: Error: Missing data in dependent variable.
#> x Fold08: preprocessor 1/1, model 6/10: Error: Missing data in dependent variable.
#> x Fold08: preprocessor 1/1, model 7/10: Error: Missing data in dependent variable.
#> x Fold08: preprocessor 1/1, model 8/10: Error: Missing data in dependent variable.
#> x Fold08: preprocessor 1/1, model 9/10: Error: Missing data in dependent variable.
#> x Fold08: preprocessor 1/1, model 10/10: Error: Missing data in dependent variable.
#> x Fold09: preprocessor 1/1, model 1/10: Error: Missing data in dependent variable.
#> x Fold09: preprocessor 1/1, model 2/10: Error: Missing data in dependent variable.
#> x Fold09: preprocessor 1/1, model 3/10: Error: Missing data in dependent variable.
#> x Fold09: preprocessor 1/1, model 4/10: Error: Missing data in dependent variable.
#> x Fold09: preprocessor 1/1, model 5/10: Error: Missing data in dependent variable.
#> x Fold09: preprocessor 1/1, model 6/10: Error: Missing data in dependent variable.
#> x Fold09: preprocessor 1/1, model 7/10: Error: Missing data in dependent variable.
#> x Fold09: preprocessor 1/1, model 8/10: Error: Missing data in dependent variable.
#> x Fold09: preprocessor 1/1, model 9/10: Error: Missing data in dependent variable.
#> x Fold09: preprocessor 1/1, model 10/10: Error: Missing data in dependent variable.
#> x Fold10: preprocessor 1/1, model 1/10: Error: Missing data in dependent variable.
#> x Fold10: preprocessor 1/1, model 2/10: Error: Missing data in dependent variable.
#> x Fold10: preprocessor 1/1, model 3/10: Error: Missing data in dependent variable.
#> x Fold10: preprocessor 1/1, model 4/10: Error: Missing data in dependent variable.
#> x Fold10: preprocessor 1/1, model 5/10: Error: Missing data in dependent variable.
#> x Fold10: preprocessor 1/1, model 6/10: Error: Missing data in dependent variable.
#> x Fold10: preprocessor 1/1, model 7/10: Error: Missing data in dependent variable.
#> x Fold10: preprocessor 1/1, model 8/10: Error: Missing data in dependent variable.
#> x Fold10: preprocessor 1/1, model 9/10: Error: Missing data in dependent variable.
#> x Fold10: preprocessor 1/1, model 10/10: Error: Missing data in dependent variable.
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.

创建于2012-12-08与reprex v2.0.2

最新更新