R中单个隐藏层大小不同的多个神经网络的交叉验证



我必须使用交叉验证来找出我的模型的单个隐藏层应该包括多少神经元(使用nnet包(。我必须在R中编写一个函数,该函数以数据、模型和参数n为输入,并使用n层神经网络在随机分割的训练和测试集上计算模型性能的准确性。在循环中使用此函数,使用隐藏层大小为n=1、2、3、20的神经网络计算性能。我的主要目标是了解隐藏层的大小,因为最后我必须绘制一张图来显示准确性与模型复杂性之间的关系。出于这个原因,理想情况下,我希望对测试集和列车集进行所有精度测量

我得到错误:找不到对象"accNN",它是存储结果的空向量。我想比较这20个模型,所以在循环中,我还必须创建20个空向量来存储20个不同的结果(accNN1、accNN2、accNN3等(。如果能帮助正确编码循环,那就太好了。

非常感谢!

set.seed(1)
df <- data.frame(
X = sample(1:100),
Y = sample(1:100),
Z = sample(1:100),
target = sample(c("yes", "no"), 10, replace = TRUE))
# Create K folds with equal size for cross validation.
nFolds  <- 5
myFolds <- cut(seq(1, nrow(df)), 
breaks = nFolds, 
labels=FALSE)
table(myFolds)
# Create object for number of neurons
sizehiddenlayer <- 3
# Define the model
mdl <- target ~ X + Y + Z

for (j in 1:sizehiddenlayer) {
# Initialize empty vectors to collect results
accNN[j]    <- rep(NA, nFolds)
for (i in 1:nFolds) {
cat("Analysis of fold", i, "n")
# 1: Define training and test sets
testObs  <- which(myFolds == i, arr.ind = TRUE)
dfTest   <- df[ testObs, ]
dfTrain  <- df[-testObs, ]
# 2: Train the models on the training sets
rsltNN[j] <- nnet(mdlB, data = df, size = j)
# 3: Predict values for the test sets
predNN[j] <- predict(rsltNN[j], type ="class")
# 4: Measure accuracy and store the results
accNN[j] <- mean(df$target == predNN[j])
}
}

您需要创建一个对象来存储结果,使用箭头不会将对象附加到现有的向量或列表中,所以这样的东西会起作用(请注意,您在dfTrain上训练,在dfTest:上预测

results = vector("list",sizehiddenlayer)
for (j in 1:sizehiddenlayer) {
results[[j]]$accNN  <- rep(NA, nFolds)
results[[j]]$rsltNN  <- vector("list",nFolds)
results[[j]]$predNN  <- vector("list",nFolds)
for (i in 1:nFolds) {
testObs  <- which(myFolds == i, arr.ind = TRUE)
dfTest   <- df[ testObs, ]
dfTrain  <- df[-testObs, ]
results[[j]]$rsltNN[[i]] <- nnet(mdl, data = dfTrain, size = j)
results[[j]]$predNN[[i]] <- predict(results[[j]]$rsltNN[[i]],dfTest, type ="class")
results[[j]]$accNN[i] <- mean(dfTest$target == results[[j]]$predNN[[i]])
}
}

结果组织在一个列表中:

head(results[[1]],2)
$accNN
[1] 0.6 0.6 0.6 0.6 0.6
$rsltNN
$rsltNN[[1]]
a 3-1-1 network with 6 weights
inputs: X Y Z 
output(s): target 
options were - entropy fitting 
$rsltNN[[2]]
a 3-1-1 network with 6 weights
inputs: X Y Z 
output(s): target 
options were - entropy fitting 

另一种方法是使用插入符号来处理CV等,或者你可以尝试类似purrr:的方法

library(purrr)
library(dplyr)
fit = function(dat,Folds,i,j){nnet(mdl, data = dat[Folds!=i,],size = j)}
pred = function(dat,Folds,mdl,i){predict(mdl,dat[Folds==i,],type="class")}
accr = function(dat,Folds,prediction,i){mean(dat$target[Folds==i] == prediction)}
results = expand.grid(hiddenlayer=1:sizehiddenlayer,fold=1:nFolds) %>%
tibble() %>%
mutate(
mdl=map2(.x=fold,.y= hiddenlayer,~fit(dat=df,F=myFolds,i =.x ,j=.y)),
pred = map2(.x=fold,.y= mdl,~pred(dat=df,F=myFolds,mdl = .y ,i=.x)),
accuracy = map2(.x=fold,.y= pred,~accr(dat=df,F=myFolds,prediction = .y ,i=.x))
)
results
# A tibble: 15 x 5
hiddenlayer  fold mdl        pred       accuracy 
<int> <int> <list>     <list>     <list>   
1           1     1 <nnt.frml> <chr [20]> <dbl [1]>
2           2     1 <nnt.frml> <chr [20]> <dbl [1]>
3           3     1 <nnt.frml> <chr [20]> <dbl [1]>
4           1     2 <nnt.frml> <chr [20]> <dbl [1]>
5           2     2 <nnt.frml> <chr [20]> <dbl [1]>
6           3     2 <nnt.frml> <chr [20]> <dbl [1]>
7           1     3 <nnt.frml> <chr [20]> <dbl [1]>

你可以这样访问它们:

results$mdl[[1]]
a 3-1-1 network with 6 weights
inputs: X Y Z 
output(s): target 
options were - entropy fitting 
> results$pred[[1]]
[1] "no" "no" "no" "no" "no" "no" "no" "no" "no" "no" "no" "no" "no" "no" "no"
[16] "no" "no" "no" "no" "no"
> results$accuracy[[1]]
[1] 0.6

最新更新