为什么宏 F1 度量值不能从宏观精度和召回率计算?



我对通过宏精度和召回率手动计算宏f1-score感兴趣。但结果却不尽相同。代码中f1和f1_new的最终公式有什么不同?

from sklearn.metrics import precision_score, recall_score, f1_score
y_true = [0, 1, 0, 1, 0 , 1, 1, 0]
y_pred = [0, 1, 0, 0, 1 , 1, 0, 0]
p = precision_score(y_true, y_pred, average='macro')
r = recall_score(y_true, y_pred, average='macro')
f1_new = (2 * p * r) / (p + r) # 0.6291390728476821
f1 = f1_score(y_true, y_pred, average='macro') #  0.6190476190476191
print(f1_new == f1) 
# False

f1_score在scikit-learn中计算如下:

all_positives = 4
all_negatives = 4
true_positives = 2
true_negatives = 3
true_positive_rate = true_positives/all_positives = 2/4
true_negative_rate = true_negatives/all_negatives = 3/4
pred_positives = 3
pred_negatives = 5
positive_predicted_value = true_positives/pred_positives = 2/3
negative_predicted_value = true_negatives/pred_negatives = 3/5
f1_score_pos = 2 * true_positive_rate * positive_predicted_value / (true_positive_rate + positive_predicted_value)
= 2 * 2/4 * 2/3 / (2/4 + 2/3)
f1_score_neg = 2 * true_negative_rate * negative_predicted_value / (true_negative_rate + negative_predicted_value)
= 2 * 3/4 * 3/5 / (3/4 + 3/5)
f1 = average(f1_score_pos, f1_score_neg)
= 2/4 * 2/3 / (2/4 + 2/3) + 3/4 * 3/5 / (3/4 + 3/5)
= 0.6190476190476191

这与skicit-learn的f1_score'macro'参数在文档中给出的定义相匹配:计算每个标签的度量,并找到它们的未加权平均值。此定义也适用于precision_scorerecall_score

你手动计算f1分数的方法如下:

precision = average(positive_predicted_value, negative_predicted_value)
= average(2/3, 3/5)
= 19/30
recall = average(true_positive_rate, true_negative_rate)
= average(2/4, 3/4)
= 5/8
f1_new = 2 * precision * recall / (precision + recall)
= 2 * 19/30 * 5/8 / (19/30 + 5/8)
= 0.6291390728476821

实际上,文档中给出的通式F1 = 2 * (precision * recall) / (precision + recall)只对average='binary'average='micro'有效,对average='macro'average='weighted'无效。从这个意义上说,正如目前scikit-learn中所呈现的那样,该公式具有误导性,因为它表明它与所选择的参数无关,而事实并非如此。

最新更新