如何计算多类文本分类的FPR,TPR,AUC,ROC_Curve。
我使用了以下代码:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
from sklearn.feature_extraction.text import CountVectorizer
vect=CountVectorizer()
vect.fit(X_train.values.astype('U'))
X_train_dtm=vect.transform(X_train.values.astype('U'))
X_test_dtm=vect.transform(X_test)
from sklearn.naive_bayes import MultinomialNB
nb = MultinomialNB()
y_score=nb.fit(X_train_dtm, y_train)
y_pred_class = nb.predict(X_test_dtm)
每件事都可以直到这里运行。但是,一旦我使用以下代码,就会产生错误。
from sklearn.metrics import roc_curve, auc, roc_auc_score
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(5):
fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
roc_auc[i] = auc(fpr[i], tpr[i])
fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel())
roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])
print ("ROC value is:",roc_auc["micro"])
错误是:
Traceback (most recent call last):
File "C:/Users/saurabh/PycharmProjects/getting_started/own_code.py", line 32, in <module>
print(metrics.roc_auc_score(y_test, y_pred_prob))
File "C:Anaconda3libsite-packagessklearnmetricsranking.py", line 260, in roc_auc_score
sample_weight=sample_weight) Accuracy by this: 0.910536779324
File "C:Anaconda3libsite-packagessklearnmetricsbase.py", line 81, in _average_binary_score
raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported
roc_curve
不支持多类格式。您必须计算二进制类。
但是要计算FPR,TPR您可以使用confusion_matrix
from sklearn.metrics import confusion_matrix
y_test = np.argmax(y_test, axis=1)
y_score = np.argmax(y_score, axis=1)
c = confusion_matrix(y_test, y_score)
TNR = float(c[0][0])
TPR = float(c[1][1])
FNR = float(c[1][0])
FPR = float(c[0][1])
这是一个简单的示例
for i in range(5):
yt_bin = [1 if x == i else 0 for x in y_test[:, i]]
fpr[i], tpr[i], _ = roc_curve(yt_bin, y_score[:, i])
roc_auc[i] = auc(fpr[i], tpr[i])