如何在mlr中联合使用makeFeathSelWrapper和resample函数

我正在使用R中的MLR包为二进制问题拟合分类模型。对于每个模型，我使用"selectFeatures"函数执行与嵌入特征选择的交叉验证。在输出中，我检索测试集和预测的平均AUC。为此，在得到一些建议(在MLR中获得测试集的预测(后，我将"makeFeathSelWrapper"函数与"resample"函数结合使用。目标似乎实现了，但结果却很奇怪。使用逻辑回归作为分类器，我得到的AUC为0.5，这意味着没有选择变量。这个结果是出乎意料的，因为我使用这个分类器使用链接问题中提到的方法得到了0.9824432的AUC。用神经网络作为分类器，我得到一个错误消息

sum(x(错误：参数的"type"(列表(无效

怎么了？

这是示例代码：

# 1. Find a synthetic dataset for supervised learning (two classes)
###################################################################
install.packages("mlbench")
library(mlbench)
data(BreastCancer)
# generate 1000 rows, 21 quantitative candidate predictors and 1 target variable 
p<-mlbench.waveform(1000) 
# convert list into dataframe
dataset<-as.data.frame(p)
# drop thrid class to get 2 classes
dataset2  = subset(dataset, classes != 3)
# 2. Perform cross validation with embedded feature selection using logistic regression
#######################################################################################  
library(BBmisc)
library(nnet)
library(mlr)
# Choice of data 
mCT <- makeClassifTask(data =dataset2, target = "classes")
# Choice of algorithm i.e. neural network
mL <- makeLearner("classif.logreg", predict.type = "prob")
# Choice of cross-validations for folds 
outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)
# Choice of feature selection method
ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)
# Choice of hold-out sampling between training and test within the fold
inner = makeResampleDesc("Holdout",stratify = TRUE)
lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)
# 3. Perform cross validation with embedded feature selection using neural network
##################################################################################
library(BBmisc)
library(nnet)
library(mlr)
# Choice of data 
mCT <- makeClassifTask(data =dataset2, target = "classes")
# Choice of algorithm i.e. neural network
mL <- makeLearner("classif.nnet", predict.type = "prob")
# Choice of cross-validations for folds 
outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)
# Choice of feature selection method
ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)
# Choice of sampling between training and test within the fold
inner = makeResampleDesc("Holdout",stratify = TRUE)
lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)

如果您运行代码的逻辑回归部分几次，您也应该得到Error in sum(x) : invalid 'type' (list) of argument错误。然而，我发现奇怪的是，在重新采样之前修复特定的种子(例如set.seed(1)(并不能确保错误出现或不出现。

错误发生在用于将功能选择输出打印到控制台的内部mlr代码中。一个非常简单的解决方法是简单地避免在makeFeatSelWrapper中使用show.info = FALSE打印这样的输出(请参阅下面的代码(。虽然这消除了错误，但可能导致错误的原因可能还有其他后果，尽管我认为错误可能只影响打印代码。

当运行你的代码时，我只得到0.90以上的AUC。请在下面找到您的逻辑回归代码，稍微重新组织一下，并提供解决方法。我在dataset2中添加了一个droplevels((，以删除因子中缺少的级别3，尽管这与解决方法无关。

library(mlbench)
library(mlr)
data(BreastCancer)
p<-mlbench.waveform(1000)
dataset<-as.data.frame(p)
dataset2  = subset(dataset, classes != 3)
dataset2  <- droplevels(dataset2  )    
mCT <- makeClassifTask(data =dataset2, target = "classes")
ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)
mL <- makeLearner("classif.logreg", predict.type = "prob")
inner = makeResampleDesc("Holdout",stratify = TRUE)
lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl, show.info = FALSE)
# uncomment this for the error to appear again. Might need to run the code a couple of times to see the error
# lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)

编辑：我报告了一个问题，并创建了一个带有修复程序的拉取请求。

相关内容

最新更新

热门标签：