R插入符号:组合rfe()和train()



我想将递归特征消除与rfe()结合起来,并使用方法rf(随机森林)将调整与模型选择与trainControl()结合起来。我想要的不是标准的汇总统计数据,而是MAPE(平均绝对百分比误差)。因此,我使用ChickWeight数据集尝试了以下代码:

library(caret)
library(randomForest)
library(MLmetrics)
# Compute MAPE instead of other metrics
mape <- function(data, lev = NULL, model = NULL){
mape <- MAPE(y_pred = data$pred, y_true = data$obs)
c(MAPE = mape)
}
# specify trainControl
trc <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid", savePred =T,
summaryFunction = mape)
# set up grid
tunegrid <- expand.grid(.mtry=c(1:3))
# specify rfeControl
rfec <- rfeControl(functions=rfFuncs, method="cv", number=10, saveDetails = TRUE)
set.seed(42)
results <- rfe(weight ~ Time + Chick + Diet, 
sizes=c(1:3), # number of predictors from which should algorithm chose the best predictor
data = ChickWeight, 
method="rf",
ntree = 250, 
metric= "RMSE", 
tuneGrid=tunegrid,
rfeControl=rfec,
trControl = trc)

代码运行时没有出现错误。但是我在哪里找到MAPE,我在trainControl中定义为summaryFunctiontrainControl是被执行还是被忽略?

我如何重写代码以使用rfe进行递归特征消除,然后使用rfe内的trainControl调整超参数mtry,同时计算额外的误差度量(MAPE)?

trainControl被忽略,作为其描述

控制列车功能的计算细微差别

建议。要使用MAPE,您需要

rfec$functions$summary <- mape

然后

rfe(weight ~ Time + Chick + Diet, 
sizes = c(1:3),
data = ChickWeight, 
method ="rf",
ntree = 250, 
metric = "MAPE", # Modified
maximize = FALSE, # Modified
rfeControl = rfec)
#
# Recursive feature selection
#
# Outer resampling method: Cross-Validated (10 fold) 
#
# Resampling performance over subset size:
#
#  Variables   MAPE  MAPESD Selected
#          1 0.1903 0.03190         
#          2 0.1029 0.01727        *
#          3 0.1326 0.02136         
#         53 0.1303 0.02041         
#
# The top 2 variables (out of 2):
#    Time, Chick.L

最新更新