r-如何在同一列表中使用插入符号包来表示不同模型的准确性



我正在尝试使用插入符号包测试模型性能。我得到了每个模型的结果,但我想得到一个列表,其中包含所有模型的准确性和ROC。我该怎么做?这是我的玩具数据和两个模型:

dat <- read.table(text = " target birds    wolfs     snakes
        0        3        9         7
        1        3        8         4
        1        1        2         8
        0        1        2         3
        0        1        8         3
        1        6        1         2
        0        6        7         1
        1        6        1         5
        0        5        9         7
        1        3        8         7
        1        4        2         7
        0        1        2         3
        0        7        6         3
        1        6        1         1
        0        6        3         9
        1        6        1         1   ",header = TRUE)

以下是两种型号:

svmRadial <- train(target ~ ., data = dat, method='svmRadial')
glm <- train(target ~ ., data = dat, method='glm')

我想得到这样一个表和一个输出:

ModelName  Accuracy  ROC
svmRadial   0.95     0.74
glm         0.93     0.7

这本质上是一个关于自定义summaryFunction的问题。你可以在这里看到类似的问题。这里是作为defaultSummarytwoClassSummary函数的组合的函数。

mySummary <- function(data, lev = NULL, model = NULL)
{
    requireNamespace("pROC")
    if (!all(levels(data[, "pred"]) == levels(data[, "obs"]))) 
        stop("levels of observed and predicted data do not match")
    rocObject <- try(pROC::roc.default(data$obs, data[, lev[1]]), 
                     silent = TRUE)
    rocAUC <- if (class(rocObject)[1] == "try-error"){ 
        NA
    }else{rocObject$auc}
    if (!is.factor(data$obs)) 
        data$obs <- factor(data$obs, levels = lev)
    Acc <- postResample(data[, "pred"], data[, "obs"])[1]
    out <- c(Acc, rocAUC)
    names(out) <- c("Accuracy","ROC")
    out
}

fitControl <- trainControl(classProbs = TRUE,
                           summaryFunction = mySummary)
set.seed(123)
svmRadial_acc_roc <- train(as.factor(target) ~ ., data = dat, method='svmRadial', trControl=fitControl)
glm_acc_roc <- train(as.factor(target) ~ ., data = dat, method='glm', trControl=fitControl)

我认为,研究结果的分布被认为是更好的做法。为此,您需要使用resamples函数。

results <- resamples(list(svm=svmRadial_acc_roc, glm=glm_acc_roc))
summary(results)
Call:
summary.resamples(object = results)
Models: svm, glm 
Number of resamples: 25 
Accuracy 
      Min. 1st Qu. Median   Mean 3rd Qu.   Max. NA's
svm 0.2500  0.5000  0.625 0.6034  0.6667 1.0000    0
glm 0.1667  0.4286  0.500 0.4993  0.6000 0.7143    0
ROC 
      Min. 1st Qu. Median   Mean 3rd Qu. Max. NA's
svm 0.4444  0.5608 0.6667 0.7422     1.0    1    1
glm 0.4444  0.6250 0.6667 0.7108     0.8    1    0

也就是说,如果你真的想要那张简单的桌子。

# svm had some cross-validation so pull 'best tune'
svm_result <- svmRadial_acc_roc$results[
    svmRadial_acc_roc$results$C == svmRadial_acc_roc$bestTune$C,
    c("Accuracy", "ROC")]
glm_result <- glm_acc_roc$results[,c("Accuracy", "ROC")]
# make data.frame
data.frame(ModelName = c("svmRadial", "glm"),
           Accuracy = c(svm_result$Accuracy, glm_result$Accuracy),
           ROC = c(svm_result$ROC, glm_result$ROC)
)
  ModelName  Accuracy       ROC
1 svmRadial 0.6034444 0.7421875
2       glm 0.4993333 0.7107778