sklearn log_缺少不同数量的类

我将log_loss与sklearn 一起使用

from sklearn.metrics import log_loss
print log_loss(true, pred,normalize=False)

我有以下错误：

ValueError: y_true and y_pred have different number of classes 38, 2

这对我来说真的很奇怪，因为数组看起来是有效的：

print pred.shape
print np.unique(pred)
print np.unique(pred).size
(19191L,)
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34 35 36 37]
38
print true.shape
print np.unique(true)
print np.unique(true).size
(19191L,)
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34 35 36 37]
38

log_loss有什么问题？为什么会抛出错误？

样本数据：

pred: array([ 0,  1,  2, ...,  3, 12, 16], dtype=int64)
true: array([ 0,  1,  2, ...,  3, 12, 16])

很简单，您使用的是预测，而不是预测的概率。您的pred变量包含

[ 1 2 1 3 .... ] #Classes : 1, 2 or 3

但要使用log_loss，它应该包含以下内容：

 #each element is an array with probability of each class
 [[ 0.1, 0.8, 0.1] [ 0.0, 0.79 , 0.21] .... ]

为了获得这些概率，使用函数predict_proba:

pred = model.predict_proba(x_test)
eval = log_loss(y_true,pred)

在log_loss方法中，真正的数组由LabelBinarizer进行拟合和转换，LabelBinariazer会更改其维度。因此，检查true和pred具有相似的维度并不意味着log_loss方法会工作，因为true的维度会发生变化。如果你只有二进制类，我建议你使用这个log_loss代价函数，否则对于多个类，这个方法不起作用。

来自log_loss文档：

y_pred：浮点的类数组，形状=（n_samples，n_classes）或（n_sample，）

分类器的predict_proba方法返回的预测概率。如果y_pred.shape=（n_samples，），则假设所提供的概率是正类的概率。y_pred中的标签被假定为按字母顺序排列，这是通过预处理完成的。LabelBinarizer。

你需要传递概率，而不是预测标签。

相关内容

最新更新

热门标签：