r语言 - 将 rpart 超调谐参数与 MLR3 中的下采样相结合



我正在浏览MLR3包中的优秀示例(mlr3gallery:不平衡数据示例(,我希望看到一个结合超参数调整和不平衡校正的示例。

从上面的链接中,作为我试图实现的目标的描述:

为了保持较低的运行时间,我们只为不公正的校正方法定义了搜索空间。但是,也可以通过使用学习器的超参数扩展搜索空间来联合调整学习器的超参数以及不平衡校正方法。

这是一个接近的例子 - mlr3 PipeOps:创建具有不同数据转换的分支,并在分支内和分支之间对不同的学习器进行基准测试

因此,我们可以(错误地(使用missuse的很好的例子作为步行:

#packages
library(paradox)
library(mlr3)
library(mlr3pipelines)
library(mlr3tuning)
#set up an rpart learner
learner <- lrn("classif.rpart", predict_type = "prob")
learner$param_set$values <- list(
cp = 0,
maxdepth = 21,
minbucket = 12,
minsplit = 24
)
#Create the tree graphs:
# graph 1, just imputehist
graph_nop <- po("imputehist") %>>%
learner
# graph 2 : imputehist and undersample majority class (ratio relative to majority class)
graph_down <- po("imputehist") %>>%
po("classbalancing", id = "undersample", adjust = "major", 
reference = "major", shuffle = FALSE, ratio = 1/2) %>>%
learner
# graph 3: impute hist and oversample minority class (ratio relative to minority class)
graph_up <- po("imputehist") %>>%
po("classbalancing", id = "oversample", adjust = "minor", 
reference = "minor", shuffle = FALSE, ratio = 2) %>>%
learner
#Convert graphs to learners and set predict_type
graph_nop <-  GraphLearner$new(graph_nop)
graph_nop$predict_type <- "prob"
graph_down <- GraphLearner$new(graph_down)
graph_down$predict_type <- "prob"
graph_up <- GraphLearner$new(graph_up)
graph_up$predict_type <- "prob"
#define re-sampling and instantiate it so always the same split will be used:
hld <- rsmp("holdout")
set.seed(123)
hld$instantiate(tsk("sonar"))
#Benchmark
bmr <- benchmark(design = benchmark_grid(task = tsk("sonar"),
learner = list(graph_nop,
graph_up,
graph_down),
hld),
store_models = TRUE) #only needed if you want to inspect the models
#check result using different measures:
bmr$aggregate(msr("classif.auc"))
bmr$aggregate(msr("classif.ce"))
#This can be also performed within one pipeline with branching but one would need to define the paramset and use a tuner:
graph2 <- 
po("imputehist") %>>%
po("branch", c("nop", "classbalancing_up", "classbalancing_down")) %>>%
gunion(list(
po("nop", id = "nop"),
po("classbalancing", id = "classbalancing_up", ratio = 2, reference = 'major'),
po("classbalancing", id = "classbalancing_down", ratio = 2, reference = 'minor') 
)) %>>%
po("unbranch") %>>%
learner
graph2$plot()
#Note that the unbranch happens before the learner since one (always the same) learner is being used. Convert graph to learner and set predict_type
graph2 <- GraphLearner$new(graph2)
graph2$predict_type <- "prob"
#Define the param set. In this case just the different branch options.
ps <- ParamSet$new(
list(
ParamFct$new("branch.selection", levels = c("nop", "classbalancing_up", "classbalancing_down")),
))

#In general you would want to add also learner hyper parameters like cp and minsplit for rpart as well as the ratio of over/undersampling.

那么,此时我们如何添加学习器超参数,如 cp 和 minsplit?

#perhaps by adding them to the param list?
ps = ParamSet$new(list(
ParamFct$new("branch.selection", levels = c("nop", "classbalancing_up", "classbalancing_down")),
ParamDbl$new("cp", lower = 0.001, upper = 0.1),
ParamInt$new("minsplit", lower = 1, upper = 10)
))
#Create a tuning instance and grid search with resolution 1 since no other parameters are tuned. The tuner will iterate through different pipeline branches as defined in the paramset.
instance <- TuningInstance$new(
task = tsk("sonar"),
learner = graph2,
resampling = hld,
measures = msr("classif.auc"),
param_set = ps,
terminator = term("none")
)

tuner <- tnr("grid_search", resolution = 1)
set.seed(321)
tuner$tune(instance)

但这会导致:

Error in (function (xs)  : 
Assertion on 'xs' failed: Parameter 'cp' not available..

我觉得我可能缺少一个关于如何结合这两件事的分支层(rpart 超参数/minsplit和cp;以及向下/向上采样(? 感谢您的任何帮助。

一旦你构造了一个管道学习器,底层参数的 ID 就会改变,因为它们被添加了一个前缀。 您可以随时检查学习者的param_set。在您的示例中,它是graph2$param_set.在那里,您将看到您正在寻找的参数如下:

ps = ParamSet$new(list(
ParamFct$new("branch.selection", levels = c("nop", "classbalancing_up", "classbalancing_down")),
ParamDbl$new("classif.rpart.cp", lower = 0.001, upper = 0.1),
ParamInt$new("classif.rpart.minsplit", lower = 1, upper = 10)
))

最新更新