我尝试使用玩具数据进行K-NN分类,并得到以下预测:
actual <- c(rep('A1',12), rep('A2',12), rep('A3',7), rep('A4',12), rep('B1',11), rep('B2',17), rep('C1',15))
prediction <- c('A1','A1','A1','A1','A1','A3','A4','A4','B1','B2','C1','C1',
'A2','A2','A2','A2','A2','A3','A4','A4','A4','B1','B1','C1',
'A1','A2','A3','A3','A3','A3','B2',
'A1','A1','A2','A2','A2','A4','A4','A4','A4','A4','A4','B1',
'A1','A2','A2','A4','B1','B1','B1','B2','B2','B2','B2',
'A1','A3','B1','B1','B1','B1','B2','B2','B2','B2','B2','B2','B2','B2','B2','C1','C1',
'A1','A1','A2','B2','B2','C1','C1','C1','C1','C1','C1','C1','C1','C1','C1')
可以使用table()
作为:
table(actual, prediction)
# prediction
# actual A1 A2 A3 A4 B1 B2 C1
# A1 5 0 1 2 1 1 2
# A2 0 5 1 3 2 0 1
# A3 1 1 4 0 0 1 0
# A4 2 3 0 6 1 0 0
# B1 1 2 0 1 3 4 0
# B2 1 0 1 0 4 9 2
# C1 2 1 0 0 0 2 10
有很多信息函数caret::confusionMatrix()
。
caret::confusionMatrix(prediction, actual)
# Confusion Matrix and Statistics
#
# Reference
# Prediction A1 A2 A3 A4 B1 B2 C1
# A1 5 0 1 2 1 1 2
# A2 0 5 1 3 2 0 1
# A3 1 1 4 0 0 1 0
# A4 2 3 0 6 1 0 0
# B1 1 2 0 1 3 4 0
# B2 1 0 1 0 4 9 2
# C1 2 1 0 0 0 2 10
#
# Overall Statistics
#
# Accuracy : 0.4884
# 95% CI : (0.379, 0.5986)
# No Information Rate : 0.1977
# P-Value [Acc > NIR] : 1.437e-09
#
# Kappa : 0.3975
# Mcnemar's Test P-Value : NA
#
# Statistics by Class:
#
# Class: A1 Class: A2 Class: A3 Class: A4 Class: B1 Class: B2 Class: C1
# Sensitivity 0.41667 0.41667 0.57143 0.50000 0.27273 0.5294 0.6667
# Specificity 0.90541 0.90541 0.96203 0.91892 0.89333 0.8841 0.9296
# Pos Pred Value 0.41667 0.41667 0.57143 0.50000 0.27273 0.5294 0.6667
# Neg Pred Value 0.90541 0.90541 0.96203 0.91892 0.89333 0.8841 0.9296
# Prevalence 0.13953 0.13953 0.08140 0.13953 0.12791 0.1977 0.1744
# Detection Rate 0.05814 0.05814 0.04651 0.06977 0.03488 0.1047 0.1163
# Detection Prevalence 0.13953 0.13953 0.08140 0.13953 0.12791 0.1977 0.1744
# Balanced Accuracy 0.66104 0.66104 0.76673 0.70946 0.58303 0.7067 0.7981
我观察到有许多子类属于另一个类。例如,A1
,A2
,A3
,A4
属于A
类。同样,B1
,B2
属于B
类。我想在将一类中的所有子类相等的情况下计算统计数据。是否有任何功能可以生成类似的统计信息,类似于上面的类和外部错误?
注意:请不要提出包含从子类删除数字的解决方案,因为实际应用程序与此不相似。出于简单目的,我给出了这个示例。
如果给出了类和子类关系,是否可以采用解决方案?
如何通过删除子类后缀手动定义类:
actual <- c(rep('A1',12), rep('A2',12), rep('A3',7), rep('A4',12), rep('B1',11), rep('B2',17), rep('C1',15))
prediction <- c('A1','A1','A1','A1','A1','A3','A4','A4','B1','B2','C1','C1',
'A2','A2','A2','A2','A2','A3','A4','A4','A4','B1','B1','C1',
'A1','A2','A3','A3','A3','A3','B2',
'A1','A1','A2','A2','A2','A4','A4','A4','A4','A4','A4','B1',
'A1','A2','A2','A4','B1','B1','B1','B2','B2','B2','B2',
'A1','A3','B1','B1','B1','B1','B2','B2','B2','B2','B2','B2','B2','B2','B2','C1','C1',
'A1','A1','A2','B2','B2','C1','C1','C1','C1','C1','C1','C1','C1','C1','C1')
actual = gsub("\d", "", actual)
prediction = gsub("\d", "", prediction)
caret::confusionMatrix(prediction, actual)
#output
Confusion Matrix and Statistics
Reference
Prediction A B C
A 34 6 3
B 6 20 2
C 3 2 10
Overall Statistics
Accuracy : 0.7442
95% CI : (0.6387, 0.8322)
No Information Rate : 0.5
P-Value [Acc > NIR] : 3.272e-06
Kappa : 0.5831
Mcnemar's Test P-Value : 1
Statistics by Class:
Class: A Class: B Class: C
Sensitivity 0.7907 0.7143 0.6667
Specificity 0.7907 0.8621 0.9296
Pos Pred Value 0.7907 0.7143 0.6667
Neg Pred Value 0.7907 0.8621 0.9296
Prevalence 0.5000 0.3256 0.1744
Detection Rate 0.3953 0.2326 0.1163
Detection Prevalence 0.5000 0.3256 0.1744
Balanced Accuracy 0.7907 0.7882 0.7981