机器学习-Python scikit学习多类多标签性能指标

我为我的多类多标签输出变量运行了随机森林分类器。我的产量低于预期。

My y_test values

     Degree  Nature
762721       1       7                              
548912       0       6
727126       1      12
14880        1      12
189505       1      12
657486       1      12
461004       1       0
31548        0       6
296674       1       7
121330       0      17

predicted output :
[[  1.   7.]
 [  0.   6.]
 [  1.  12.]
 [  1.  12.]
 [  1.  12.]
 [  1.  12.]
 [  1.   0.]
 [  0.   6.]
 [  1.   7.]
 [  0.  17.]]

现在我想检查我的分类器的性能。我发现对于多类多标签来说，"Hamming loss or jaccard_smilarity_score"是一个很好的指标。我试着计算它，但我得到了价值错误。

Error:
ValueError: multiclass-multioutput is not supported

我试过的底线是：

print hamming_loss(y_test, RF_predicted)
print jaccard_similarity_score(y_test, RF_predicted)

谢谢，

要计算多类别/多标签的无支撑hamming损失，您可以：

import numpy as np
y_true = np.array([[1, 1], [2, 3]])
y_pred = np.array([[0, 1], [1, 2]])
np.sum(np.not_equal(y_true, y_pred))/float(y_true.size)
0.75

您还可以获得两个标签中每一个的confusion_matrix，如下所示：

from sklearn.metrics import confusion_matrix, precision_score
np.random.seed(42)
y_true = np.vstack((np.random.randint(0, 2, 10), np.random.randint(2, 5, 10))).T
[[0 4]
 [1 4]
 [0 4]
 [0 4]
 [0 2]
 [1 4]
 [0 3]
 [0 2]
 [0 3]
 [1 3]]
y_pred = np.vstack((np.random.randint(0, 2, 10), np.random.randint(2, 5, 10))).T
[[1 2]
 [1 2]
 [1 4]
 [1 4]
 [0 4]
 [0 3]
 [1 4]
 [1 3]
 [1 3]
 [0 4]]
confusion_matrix(y_true[:, 0], y_pred[:, 0])
[[1 6]
 [2 1]]
confusion_matrix(y_true[:, 1], y_pred[:, 1])
[[0 1 1]
 [0 1 2]
 [2 1 2]]

你也可以这样计算precision_score（或者以类似的方式计算recall_score）：

precision_score(y_true[:, 0], y_pred[:, 0])
0.142857142857

相关内容

最新更新

热门标签：