在拟合和预测ML模型时,1D或2D阵列是否重要



我开发了一个文本分类模型,其中X_testX-train是2D数组。其中y_testy_train是1D阵列。尽管我在训练、拟合和预测ML模型时没有遇到任何错误。但我不知道为什么我在生成ROC分数时遇到困难。上面写着AxisError: axis 1 is out of bounds for array of dimension 1!!

我找不到解决这个问题的办法。所以我很想知道ML模型中的1D和2D阵列是否存在相关性。或者它应该是其中之一;1D或2D阵列。

有人能解释一下吗?

文本分类模型的示例代码(用于生成roc分数(:

from sklearn.metrics import roc_curve, roc_auc_score
r_auc = roc_auc_score(y_test, r_probs, multi_class='OVO')

在计算auroc之前,我做了以下操作;

#预测概率

r_probs = [0 for _ in range(len(y_test))]
rf_probs = RFClass.predict_proba(X_test)
dt_probs = DTClass.predict_proba(X_test)
sgdc_probs = sgdc_model.predict_proba(X_test)

#保持积极结果的概率。

dt_probs = dt_probs[:, 1]
sgdc_probs = sgdc_probs[:, 1]
rf_probs = rf_probs[:, 1]

y_test样本输出;

Covid19 - Form
Covid19 - Phone
Covid19 - Email
Covid19 - Email
Covid19 - Phone

r_probs样本输出;

[0,0,0,0,0,…]

这是错误;

---------------------------------------------------------------------------
AxisError                                 Traceback (most recent call last)
/tmp/ipykernel_14270/1310904144.py in <module>
4 from sklearn.metrics import roc_curve, roc_auc_score
5 
----> 6 r_auc = roc_auc_score(y_test, r_probs, multi_class='OVO')
7 #rf_auc = roc_auc_score(y_test, rf_probs, multi_class='ovr')
8 #dt_auc = roc_auc_score(y_test, dt_probs, multi_class='ovr')
packages/sklearn/metrics/_ranking.py in roc_auc_score(y_true, y_score, average, sample_weight, max_fpr, multi_class, labels)
559         if multi_class == "raise":
560             raise ValueError("multi_class must be in ('ovo', 'ovr')")
--> 561         return _multiclass_roc_auc_score(
562             y_true, y_score, labels, multi_class, average, sample_weight
563         )

您的y_testr_probs的形状似乎不匹配。此外,您似乎已将r_probs分配为全零,并且从未更新过它们。请注意,为了使roc_auc_score工作,您需要有一些1的基本事实和预测。

首先一些背景:

y_test和预测都可以是1-D或2-D,这取决于您是否将其公式化为二进制多类多重标签问题。在y_truemulti_class参数下阅读更多roc_auc_score

y_true:

真标签或二进制标签指示符。二进制和多类的情况期望具有形状(n_samples(的标签,而多标签的情况期望带有形状(n_samples,n_classes(的二进制标签指示符。

多类别:

仅用于多类目标。确定要使用的配置类型。默认值会引发错误,因此必须显式传递"ovr"或"ovo"。

在调用roc_auc_score函数之前,我会打印y_testr_probs的形状。显示以下适用于二进制(1-D标签(和多标签

二进制(1-D(类标签:

import numpy as np
from sklearn.metrics import roc_auc_score
np.random.seed(42)
n = 100
y_test = np.random.randint(0, 2, (n,))
r_probs = np.random.randint(0, 2, (n,))
r_auc = roc_auc_score(y_test, r_probs)
print(f'Shape of y_test: {y_test.shape}')
print(f'Shape of r_probs: {r_probs.shape}')
print(f'y_test: {y_test}')
print(f'r_probs: {r_probs}')
print(f'r_auc: {r_auc}')

输出:

Shape of y_test: (100,)
Shape of r_probs: (100,)
y_test: [0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0 1  0 0 0 0 0 1 1 1 1 1 0 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0]
r_probs: [0 1 1 1 1 1 1 1 1 0 1 0 1 1 0 1 0 1 1 0 1 0 1 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 1 1 1 0 1 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 1 1 0 0]
r_auc: 0.5073051948051948

多标签(2-D(类标签:

y_test = np.random.randint(0, 2, (n, 4))
r_probs = np.random.randint(0, 2, (n, 4))
r_auc = roc_auc_score(y_test, r_probs, multi_class='ovr')
print(f'Shape of y_test: {y_test.shape}')
print(f'Shape of r_probs: {r_probs.shape}')
print(f'y_test: {y_test}')
print(f'r_probs: {r_probs}')
print(f'r_auc: {r_auc}')

输出:

Shape of y_test: (100, 4)
Shape of r_probs: (100, 4)
y_test: [[0 1 0 0] [1 0 1 1] ... [1 0 0 0] [0 0 1 1]]
r_probs: [[0 1 1 1] [0 0 0 1] ... [1 1 0 1] [1 1 1 0]]
r_auc: 0.5270015526313198

最新更新