r语言 - 使用嵌套交叉验证的基准实验的特征重要性



我在 R 中使用mlr包来比较两个学习器,即随机森林和套索分类器,在二元分类任务上。我想以类似于caret::varImp()的方式提取特征对最佳分类器(在这种情况下为随机森林(的重要性。我遇到了getBMRFeatSelResults()getFeatureImportance()generateFeatureImportanceData()但似乎没有一个能做到这一点。这是我使用嵌套重采样执行基准测试实验的代码。理想情况下,我想要基尼系数的平均减少。谢谢。

library(easypackages)
libraries("mlr","purrr","glmnet","parallelMap","parallel")
data = read.table("data_past.txt", h = T)
set.seed(123)
task = makeClassifTask(id = "past_history", data = data, target = "DIAG", positive = "BD")
ps_rf = makeParamSet(makeIntegerParam("mtry", lower = 4, upper = 16),makeDiscreteParam("ntree", values = 1000))
ps_lasso = makeParamSet(makeNumericParam("s", lower = .01, upper = 1),makeDiscreteParam("alpha", values = 1))
ctrl_rf = makeTuneControlRandom(maxit = 10L)
ctrl_lasso = makeTuneControlRandom(maxit = 100L)
inner = makeResampleDesc("RepCV", fold = 10, reps = 3, stratify = TRUE)
lrn_rf = makeLearner("classif.randomForest", predict.type = "prob", fix.factors.prediction = TRUE)
lrn_rf = makeTuneWrapper(lrn_rf, resampling = inner, par.set = ps_rf, control = ctrl_rf, measures = auc, show.info = FALSE)
lrn_lasso = makeLearner("classif.glmnet", predict.type = "prob", fix.factors.prediction = TRUE)
lrn_lasso = makeTuneWrapper(learner = lrn_lasso, resampling = inner, control = ctrl_lasso,  par.set = ps_lasso, measures = auc, show.info = FALSE)
outer = makeResampleDesc("CV", iters = 10, stratify = TRUE)
lrns = list(lrn_rf, lrn_lasso)
parallelStartMulticore(36)
res = benchmark(lrns, task, outer, measures = list(auc, ppv, npv, fpr, tpr, mmce), show.info = FALSE, model = T)
saveRDS(res, file = "res.rds")
parallelStop()
models <- getBMRModels(res, drop = TRUE)

既然你在谈论简历,

提取特征对最佳分类器的重要性

没有明确表示你想做什么。简历中没有"一个最佳单一模型",通常重要性不会在简历中衡量。

CV旨在估计/比较预测性能,而不是计算/解释特征重要性。

以下是对可能有所帮助的类似问题的答案。

我遇到了getBMRFeatSelResults((, getFeatureImportant((, generateFeatureImportantData(( 但似乎没有一个能解决问题。

通过做出这样的陈述,这将有助于了解为什么这些函数不能详细执行您想要的操作,而不仅仅是陈述事实:)

相关内容

  • 没有找到相关文章

最新更新