我正在学习线性判别分析,并使用scikit学习模块。我被LinearDiscriminationAnalysis类的"coeff_"属性弄糊涂了。据我所知,这些是判别函数系数(sklearn称之为权重向量)。由于应该有(n_classes-1)判别函数,我希望coeff_属性是一个具有形状(n_components,n_features)的数组,但它打印的是一个(n_classes,n_feature)数组。下面是一个使用sklearn的Iris数据集示例的示例。由于有3个类和2个组件,我希望print(lda.coeffe_)给我一个2x4数组,而不是3x4数组。。。
也许我误解了权重向量是什么,也许它们是分类函数的系数?
如何获得每个判别/规范函数中每个变量的系数?
jupyter笔记本屏幕截图
此处编码:
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
import numpy as np
iris = datasets.load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names
lda = LinearDiscriminantAnalysis(n_components=2,store_covariance=True)
X_r = lda.fit(X, y).transform(X)
plt.figure()
for color, i, target_name in zip(colors, [0, 1, 2], target_names):
plt.scatter(X_r2[y == i, 0], X_r2[y == i, 1], alpha=.8, color=color,
label=target_name)
plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.xlabel('Function 1 (%.2f%%)' %(lda.explained_variance_ratio_[0]*100))
plt.ylabel('Function 2 (%.2f%%)' %(lda.explained_variance_ratio_[1]*100))
plt.title('LDA of IRIS dataset')
print(lda.coef_)
#output -> [[ 6.24621637 12.24610757 -16.83743427 -21.13723331]
# [ -1.51666857 -4.36791652 4.64982565 3.18640594]
# [ -4.72954779 -7.87819105 12.18760862 17.95082737]]
您可以使用以下代码计算系数:
def LDA_coefficients(X,lda):
nb_col = X.shape[1]
matrix= np.zeros((nb_col+1,nb_col), dtype=int)
Z=pd.DataFrame(data=matrix,columns=X.columns)
for j in range(0,nb_col):
Z.iloc[j,j] = 1
LD = lda.transform(Z)
nb_funct= LD.shape[1]
results = pd.DataFrame();
index = ['const']
for j in range(0,LD.shape[0]-1):
index = np.append(index,'C'+str(j+1))
for i in range(0,LD.shape[1]):
coef = [LD[-1][i]]
for j in range(0,LD.shape[0]-1):
coef = np.append(coef,LD[j][i]-LD[-1][i])
result = pd.Series(coef)
result.index = index
column_name = 'LD' + str(i+1)
results[column_name] = result
return results
在调用此函数之前,您需要完成线性判别分析:
lda = LinearDiscriminantAnalysis()
lda.fit(X,y)