r-如何使用插入符号比较不同的模型,调整不同的参数



我正在尝试实现一些函数来比较五种不同的机器学习模型,以预测回归问题中的一些值。

我的意图是开发一套函数,可以训练不同的代码并将它们组织成一套结果。我通过实例选择的模型有:Lasso、随机森林、SVM、线性模型和神经网络。为了调整一些模型,我打算使用Max Kuhn的参考:https://topepo.github.io/caret/available-models.html.然而,由于每个模型都需要不同的调整参数,我不知道如何设置它们:

首先,我将网格设置为"nnet"模型tunning。在这里,我选择了隐藏层中不同数量的节点和衰减系数:

my.grid <- expand.grid(size=seq(from = 1, to = 10, by = 1), decay = seq(from = 0.1, to = 0.5, by = 0.1))

然后,我构建了将在6倍配置中运行5次五个模型的函数:

my_list_model <- function(model) {
set.seed(1)
train.control <- trainControl(method = "repeatedcv", 
number = 6,
repeats =  5,
returnResamp = "all",
savePredictions = "all")
# The tunning configurations of machine learning models:
set.seed(1)
fit_m <- train(ST1 ~., 
data = train, # my original dataframe, not showed in this code
method = model, 
metric = "RMSE", 
preProcess = "scale", 
trControl = train.control
linout = 1        #  linear activation function output
trace = FALSE
maxit = 1000
tuneGrid = my.grid) # Here is how I call the tune of 'nnet' parameters
return(fit_m)
} 

最后,我执行五个模型:

lapply(list(
Lass = "lasso", 
RF = "rf", 
SVM = "svmLinear",
OLS = "lm", 
NN = "nnet"), 
my_list_model) -> model_list

然而,当我运行这个时,它显示:

错误:调整参数网格不应具有列分数

据我所知,我不知道如何很好地指定调优参数。如果我试图放弃"nnet"模型,并将其更改为XGBoost模型,例如,在倒数第二行中,它似乎运行良好,并且会计算结果。也就是说,问题似乎出在"nnet"调整参数上。

然后,我认为我真正的问题是:如何配置模型的这些不同参数,特别是"nnet"模型。此外,由于我不需要设置套索、随机森林、svmLlinear和线性模型的参数,所以插入程序包是如何调整它们的?

my_list_model <- function(model,grd=NULL){
train.control <- trainControl(method = "repeatedcv", 
number = 6,
returnResamp = "all",
savePredictions = "all")
# The tuning configurations of machine learning models:
set.seed(1)
fit_m <- train(Y ~., 
data = df, # my original dataframe, not showed in this code
method = model, 
metric = "RMSE", 
preProcess = "scale", 
trControl = train.control,
linout = 1,        #  linear activation function output
trace = FALSE,
maxit = 1000,
tuneGrid = grd) # Here is how I call the tune of 'nnet' parameters
return(fit_m)
}

首先运行以下代码并查看所有相关参数

modelLookup('rf')

现在根据上述查找代码生成所有模型的网格

svmGrid <-  expand.grid(C=c(3,2,1))
rfGrid <-  expand.grid(mtry=c(5,10,15))

创建一个所有模型网格的列表,并确保模型的名称与列表中的名称相同

grd_all<-list(svmLinear=svmGrid
,rf=rfGrid)
model_list<-lapply(c("rf","svmLinear")
,function(x){my_list_model(x,grd_all[[x]])})
model_list
[[1]]
Random Forest 
17 samples
3 predictor
Pre-processing: scaled (3) 
Resampling: Cross-Validated (6 fold, repeated 1 times) 
Summary of sample sizes: 14, 14, 15, 14, 14, 14, ... 
Resampling results across tuning parameters:
mtry  RMSE      Rsquared   MAE     
5    63.54864  0.5247415  55.72074
10    63.70247  0.5255311  55.35263
15    62.13805  0.5765130  54.53411
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 15.
[[2]]
Support Vector Machines with Linear Kernel 
17 samples
3 predictor
Pre-processing: scaled (3) 
Resampling: Cross-Validated (6 fold, repeated 1 times) 
Summary of sample sizes: 14, 14, 15, 14, 14, 14, ... 
Resampling results across tuning parameters:
C  RMSE      Rsquared   MAE     
1  59.83309  0.5879396  52.26890
2  66.45247  0.5621379  58.74603
3  67.28742  0.5576000  59.55334
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was C = 1.

相关内容

  • 没有找到相关文章

最新更新