na.fail.default 中的随机林错误:对象中缺少值



我正在运行一个RF模型,该模型在大多数变量中运行时没有错误;但是,当我包含一个变量时:duration_in_program和以下代码:

```{r Random Forest Model}
## Run a Random Forest model
mod_rf <-
train(left_school ~ job_title 
+ gender + 
+ marital_status + age_at_enrollment + monthly_wage + educational_qualification + cityD + educational_qualification + cityC.
+ cityB +cityA + duration_in_program, # Equation (outcome and everything else)
data=train_data, # Training data 
method = "ranger", # random forest (ranger is much faster than rf)
metric = "ROC", # area under the curve
trControl = control_conditions,
tuneGrid = tune_mtry
)
mod_rf

我收到以下错误:

Error in na.fail.default(list(left_welfare = c(1L, 2L, 2L, 2L, 2L, 2L, : missing values in object

假设train()来自插入符号,您可以使用na.action参数指定一个函数来处理 na。默认值为na.fail。一个很常见的是na.omit.randomForest 库有na.roughfix,它将"按中位数/模式插补缺失值"。

mod_rf <-
train(left_school ~ job_title 
+ gender + 
+ marital_status + age_at_enrollment + monthly_wage + educational_qualification + cityD + educational_qualification + cityC.
+ cityB +cityA + duration_in_program, # Equation (outcome and everything else)
data=train_data, # Training data 
method = "ranger", # random forest (ranger is much faster than rf)
metric = "ROC", # area under the curve
trControl = control_conditions,
tuneGrid = tune_mtry,
na.action = na.omit
)
mod_rf

最新更新