在拟合和预测ML模型时，1D或2D阵列是否重要

我开发了一个文本分类模型，其中X_test和X-train是2D数组。其中y_test和y_train是1D阵列。尽管我在训练、拟合和预测ML模型时没有遇到任何错误。但我不知道为什么我在生成ROC分数时遇到困难。上面写着AxisError: axis 1 is out of bounds for array of dimension 1！！

我找不到解决这个问题的办法。所以我很想知道ML模型中的1D和2D阵列是否存在相关性。或者它应该是其中之一；1D或2D阵列。

有人能解释一下吗？

文本分类模型的示例代码(用于生成roc分数(：

from sklearn.metrics import roc_curve, roc_auc_score
r_auc = roc_auc_score(y_test, r_probs, multi_class='OVO')

在计算auroc之前，我做了以下操作；

#预测概率

r_probs = [0 for _ in range(len(y_test))]
rf_probs = RFClass.predict_proba(X_test)
dt_probs = DTClass.predict_proba(X_test)
sgdc_probs = sgdc_model.predict_proba(X_test)

#保持积极结果的概率。

dt_probs = dt_probs[:, 1]
sgdc_probs = sgdc_probs[:, 1]
rf_probs = rf_probs[:, 1]

y_test样本输出；

Covid19 - Form
Covid19 - Phone
Covid19 - Email
Covid19 - Email
Covid19 - Phone

r_probs样本输出；

[0，0，0，0，0，…]

这是错误；

---------------------------------------------------------------------------
AxisError                                 Traceback (most recent call last)
/tmp/ipykernel_14270/1310904144.py in <module>
4 from sklearn.metrics import roc_curve, roc_auc_score
5 
----> 6 r_auc = roc_auc_score(y_test, r_probs, multi_class='OVO')
7 #rf_auc = roc_auc_score(y_test, rf_probs, multi_class='ovr')
8 #dt_auc = roc_auc_score(y_test, dt_probs, multi_class='ovr')
packages/sklearn/metrics/_ranking.py in roc_auc_score(y_true, y_score, average, sample_weight, max_fpr, multi_class, labels)
559         if multi_class == "raise":
560             raise ValueError("multi_class must be in ('ovo', 'ovr')")
--> 561         return _multiclass_roc_auc_score(
562             y_true, y_score, labels, multi_class, average, sample_weight
563         )

您的y_test和r_probs的形状似乎不匹配。此外，您似乎已将r_probs分配为全零，并且从未更新过它们。请注意，为了使roc_auc_score工作，您需要有一些1的基本事实和预测。

首先一些背景：

y_test和预测都可以是1-D或2-D，这取决于您是否将其公式化为二进制、多类或多重标签问题。在y_true和multi_class参数下阅读更多roc_auc_score

y_true：

真标签或二进制标签指示符。二进制和多类的情况期望具有形状(n_samples(的标签，而多标签的情况期望带有形状(n_samples，n_classes(的二进制标签指示符。

多类别：

仅用于多类目标。确定要使用的配置类型。默认值会引发错误，因此必须显式传递"ovr"或"ovo"。

在调用roc_auc_score函数之前，我会打印y_test和r_probs的形状。显示以下适用于二进制(1-D标签(和多标签

二进制(1-D(类标签：

import numpy as np
from sklearn.metrics import roc_auc_score
np.random.seed(42)
n = 100
y_test = np.random.randint(0, 2, (n,))
r_probs = np.random.randint(0, 2, (n,))
r_auc = roc_auc_score(y_test, r_probs)
print(f'Shape of y_test: {y_test.shape}')
print(f'Shape of r_probs: {r_probs.shape}')
print(f'y_test: {y_test}')
print(f'r_probs: {r_probs}')
print(f'r_auc: {r_auc}')

输出：

Shape of y_test: (100,) Shape of r_probs: (100,) y_test: [0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0 1 0 0 0 0 0 1 1 1 1 1 0 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0] r_probs: [0 1 1 1 1 1 1 1 1 0 1 0 1 1 0 1 0 1 1 0 1 0 1 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 1 1 1 0 1 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 1 1 0 0] r_auc: 0.5073051948051948
多标签(2-D(类标签：
y_test = np.random.randint(0, 2, (n, 4)) r_probs = np.random.randint(0, 2, (n, 4)) r_auc = roc_auc_score(y_test, r_probs, multi_class='ovr') print(f'Shape of y_test: {y_test.shape}') print(f'Shape of r_probs: {r_probs.shape}') print(f'y_test: {y_test}') print(f'r_probs: {r_probs}') print(f'r_auc: {r_auc}')
输出：

Shape of y_test: (100, 4) Shape of r_probs: (100, 4) y_test: [[0 1 0 0] [1 0 1 1] ... [1 0 0 0] [0 0 1 1]] r_probs: [[0 1 1 1] [0 0 0 1] ... [1 1 0 1] [1 1 1 0]] r_auc: 0.5270015526313198

y_true：

多类别：

二进制(1-D(类标签：

多标签(2-D(类标签：

相关内容

最新更新

热门标签：