如何在LogisticRegressionCV中得到混淆矩阵?



我实现了两个版本的逻辑回归模型。在第二个版本中,我的目标是找到最好的超参数C,否则我就可以接受第一个版本。我还需要最好的超参数c的混淆矩阵和系数矩阵的平均值和std. dev.但我很困惑如何在LogisticRegressionCV第二版中获得这些。

第1版
sss = RepeatedStratifiedKFold(n_splits=K_fold, n_repeats=n_repeats ,random_state=36851234)
lambda_c=1.0
cmn=[]
coef=[]
for train_index, test_index in sss.split(X,y):
x_train,x_test=X[train_index],X[test_index]
y_train,y_test=y[train_index],y[test_index]
log_reg_model = LogisticRegression(max_iter=50000,C=lambda_c,penalty='l1',multi_class='ovr',class_weight='balanced',solver='liblinear')
pipe=Pipeline([  ('polynomial_features',polynomial),   ('StandardScaler',StandardScaler()), ('logistic_regression',log_reg_model)])
pipe.fit(x_train, y_train)
y_pred=pipe.predict(x_test)
y_prob = pipe.predict_proba(x_test)
LR= pipe.named_steps['logistic_regression']
coef.append(LR.coef_)
cmn.append(confusion_matrix(y_test,y_pred,normalize='true'))
cmn_std=np.std(np.array(cmn),axis=0)
coef_std=np.std(np.array(coef),axis=0)
cmn=np.mean(np.array(cmn),axis=0)
coef=np.mean(np.array(coef),axis=0)

版本2
sss = RepeatedStratifiedKFold(n_splits=K_fold, n_repeats=n_repeats ,random_state=36851234)
lambda_c=list(np.power(10.0, np.arange(-10, 3)))
scoring='precision_weighted'
log_reg_model = LogisticRegressionCV(max_iter=50000,fit_intercept=False,cv=sss,Cs=lambda_c,penalty='l1',multi_class='ovr',scoring=scoring,class_weight='balanced',solver='liblinear')
pipe=Pipeline([  ('polynomial_features',polynomial),   ('StandardScaler',StandardScaler()), ('logistic_regression',log_reg_model)])
pipe.fit(X,y)
poly = pipe.named_steps['polynomial_features']
LR= pipe.named_steps['logistic_regression']
LR.coef_ # the shape is [3,6]; #class = 3 and #features = 6 
LR.coefs_paths_  # the shape is [500,13,6]; #cv = 500, #C=13, #features=6  

如何得到混淆矩阵和系数的均值和标准差。第二版模型中的矩阵?这对我来说不是很清楚。我也看到LR.C_=[100]。0.1 0.1]在我的数据有3个类的输出。为什么每个类的超参数值不同?这部分我也不太明白。谢谢。

如果目标是这样,

from sklearn.datasets import make_classification
import numpy as np
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import cross_val_predict
from sklearn.linear_model import LogisticRegressionCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import confusion_matrix
cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=10 ,random_state=36851234)
x, y = make_classification(100, 6, n_classes=3, n_informative = 3)
lambda_c=list(np.power(10.0, np.arange(-10, 3)))
scoring='precision_weighted'
log_reg_model = LogisticRegressionCV( max_iter=5000,
fit_intercept=False,
cv=cv,
Cs=lambda_c,penalty='l1',
multi_class='ovr',
#scoring=scoring,
class_weight='balanced',
solver='liblinear')
polynomial = PolynomialFeatures(2)
pipe=Pipeline([  ('polynomial_features',polynomial),   ('StandardScaler',StandardScaler()), ('logistic_regression',log_reg_model)])
pipe.fit(x, y)
LR= pipe.named_steps['logistic_regression']
y_pred = cross_val_predict(pipe, x, y, cv=10)
conf_mat = confusion_matrix(y, y_pred)
print(conf_mat)
array([[17,  8,  9],
[ 5, 28,  0],
[ 3,  3, 27]])

您在LR.C_中有三个值,因为您在逻辑回归中使用选项multi_class='ovr'。根据scikit-learn文档,它只有一个,而其余的,也就是说,实际上你有3个分类器。参见文档sklearn.linear_model.LogisticRegression:

在multiclass情况下,如果' multi_class '选项设置为' OvR ',则训练算法使用one-vs-rest (OvR)方案

最新更新