R-多级混淆矩阵,可以使用类别的类误差来计算



我尝试使用玩具数据进行K-NN分类,并得到以下预测:

actual <- c(rep('A1',12), rep('A2',12), rep('A3',7), rep('A4',12), rep('B1',11), rep('B2',17), rep('C1',15))
prediction <- c('A1','A1','A1','A1','A1','A3','A4','A4','B1','B2','C1','C1',
                'A2','A2','A2','A2','A2','A3','A4','A4','A4','B1','B1','C1',
                'A1','A2','A3','A3','A3','A3','B2',
                'A1','A1','A2','A2','A2','A4','A4','A4','A4','A4','A4','B1',
                'A1','A2','A2','A4','B1','B1','B1','B2','B2','B2','B2',
                'A1','A3','B1','B1','B1','B1','B2','B2','B2','B2','B2','B2','B2','B2','B2','C1','C1',
                'A1','A1','A2','B2','B2','C1','C1','C1','C1','C1','C1','C1','C1','C1','C1')

可以使用table()作为:

来实现有关预测的基本思想
table(actual, prediction)
#       prediction
# actual A1 A2 A3 A4 B1 B2 C1
#     A1  5  0  1  2  1  1  2
#     A2  0  5  1  3  2  0  1
#     A3  1  1  4  0  0  1  0
#     A4  2  3  0  6  1  0  0
#     B1  1  2  0  1  3  4  0
#     B2  1  0  1  0  4  9  2
#     C1  2  1  0  0  0  2 10

有很多信息函数caret::confusionMatrix()

caret::confusionMatrix(prediction, actual)
# Confusion Matrix and Statistics
# 
# Reference
# Prediction A1 A2 A3 A4 B1 B2 C1
# A1  5  0  1  2  1  1  2
# A2  0  5  1  3  2  0  1
# A3  1  1  4  0  0  1  0
# A4  2  3  0  6  1  0  0
# B1  1  2  0  1  3  4  0
# B2  1  0  1  0  4  9  2
# C1  2  1  0  0  0  2 10
# 
# Overall Statistics
# 
# Accuracy : 0.4884         
# 95% CI : (0.379, 0.5986)
# No Information Rate : 0.1977         
# P-Value [Acc > NIR] : 1.437e-09      
# 
# Kappa : 0.3975         
# Mcnemar's Test P-Value : NA             
# 
# Statistics by Class:
# 
#                      Class: A1 Class: A2 Class: A3 Class: A4 Class: B1 Class: B2 Class: C1
# Sensitivity            0.41667   0.41667   0.57143   0.50000   0.27273    0.5294    0.6667
# Specificity            0.90541   0.90541   0.96203   0.91892   0.89333    0.8841    0.9296
# Pos Pred Value         0.41667   0.41667   0.57143   0.50000   0.27273    0.5294    0.6667
# Neg Pred Value         0.90541   0.90541   0.96203   0.91892   0.89333    0.8841    0.9296
# Prevalence             0.13953   0.13953   0.08140   0.13953   0.12791    0.1977    0.1744
# Detection Rate         0.05814   0.05814   0.04651   0.06977   0.03488    0.1047    0.1163
# Detection Prevalence   0.13953   0.13953   0.08140   0.13953   0.12791    0.1977    0.1744
# Balanced Accuracy      0.66104   0.66104   0.76673   0.70946   0.58303    0.7067    0.7981

我观察到有许多子类属于另一个类。例如,A1A2A3A4属于A类。同样,B1B2属于B类。我想在将一类中的所有子类相等的情况下计算统计数据。是否有任何功能可以生成类似的统计信息,类似于上面的类和外部错误?

注意:请不要提出包含从子类删除数字的解决方案,因为实际应用程序与此不相似。出于简单目的,我给出了这个示例。

如果给出了类和子类关系,是否可以采用解决方案?

如何通过删除子类后缀手动定义类:

    actual <- c(rep('A1',12), rep('A2',12), rep('A3',7), rep('A4',12), rep('B1',11), rep('B2',17), rep('C1',15))
    prediction <- c('A1','A1','A1','A1','A1','A3','A4','A4','B1','B2','C1','C1',
                    'A2','A2','A2','A2','A2','A3','A4','A4','A4','B1','B1','C1',
                    'A1','A2','A3','A3','A3','A3','B2',
                    'A1','A1','A2','A2','A2','A4','A4','A4','A4','A4','A4','B1',
                    'A1','A2','A2','A4','B1','B1','B1','B2','B2','B2','B2',
                    'A1','A3','B1','B1','B1','B1','B2','B2','B2','B2','B2','B2','B2','B2','B2','C1','C1',
                    'A1','A1','A2','B2','B2','C1','C1','C1','C1','C1','C1','C1','C1','C1','C1')
    actual = gsub("\d", "", actual)
    prediction = gsub("\d", "", prediction)
    caret::confusionMatrix(prediction, actual)
#output
Confusion Matrix and Statistics
          Reference
Prediction  A  B  C
         A 34  6  3
         B  6 20  2
         C  3  2 10
Overall Statistics
               Accuracy : 0.7442          
                 95% CI : (0.6387, 0.8322)
    No Information Rate : 0.5             
    P-Value [Acc > NIR] : 3.272e-06       
                  Kappa : 0.5831          
 Mcnemar's Test P-Value : 1               
Statistics by Class:
                     Class: A Class: B Class: C
Sensitivity            0.7907   0.7143   0.6667
Specificity            0.7907   0.8621   0.9296
Pos Pred Value         0.7907   0.7143   0.6667
Neg Pred Value         0.7907   0.8621   0.9296
Prevalence             0.5000   0.3256   0.1744
Detection Rate         0.3953   0.2326   0.1163
Detection Prevalence   0.5000   0.3256   0.1744
Balanced Accuracy      0.7907   0.7882   0.7981

最新更新