sklearn中的因子分析:解释方差. scikit-learn 中的

PCA 具有一个名为"explained_variance"的属性，它捕获每个组件解释的方差。我在scikit-learn中的因子分析中没有看到类似的东西。如何计算因子分析的每个分量解释的方差？

这是你可以做到的：

执行因子分析后，首先获取分量矩阵和噪声方差，让 fa 成为您的拟合模型

m = fa.components_
n = fa.noise_variance_

使此矩阵平方

m1 = m**2

计算 m1 中每列的总和

m2 = np.sum(m1,axis=1)

现在，由第一个因子解释的方差百分比将是

pvar1 = (100*m2[0])/np.sum(m2)

同样，第二个因素

pvar2 = (100*m2[1])/np.sum(m2)

但是，噪声分量也解释了方差，如果您在解释的方差中考虑了这一点，则需要计算

pvar1_with_noise = (100*m2[0])/(np.sum(m2)+np.sum(n))
pvar2_with_noise = (100*m2[1])/(np.sum(m2)+np.sum(n))

等等。希望这有帮助。

根据FA/PCA的通常命名法，scikit-learn输出的components_可以称为其他地方的加载。例如，一旦您更改设置以匹配scikit-learn(即设置rotation=None，设置method='ml'，并确保在输入scikit-learn功能时数据标准化，因为FactorAnalyzer在内部标准化数据)，软件包FactorAnalyzer就会输出等效loadings_。

与scikit-learn的PCA的components_输出(单位长度特征向量)相比，FA输出已经缩放，因此可以通过对平方求和来提取解释的方差。请注意，此处解释的方差比例表示为原始变量的总方差，而不是因子的方差，如@Gaurav的答案所示。

from sklearn.decomposition import FactorAnalysis
k_fa = 3   # e.g.
fa_k = FactorAnalysis(n_components=k_fa).fit(X_in)
fa_loadings = fa_k.components_.T    # loadings
# variance explained
total_var = X_in.var(axis=0).sum()  # total variance of original variables,
# equal to no. of vars if they are standardized
var_exp = np.sum(fa_loadings**2, axis=0)
prop_var_exp = var_exp/total_var
cum_prop_var_exp = np.cumsum(var_exp/total_var)
print(f"variance explained: {var_exp.round(2)}")
print(f"proportion of variance explained: {prop_var_exp.round(3)}")
print(f"cumulative proportion of variance explained: {cum_prop_var_exp.round(3)}")
# e.g. output:
#   variance explained: [3.51 0.73]
#   proportion of variance explained: [0.351 0.073]
#   cumulative proportion of variance explained: [0.351 0.425]

相关内容

最新更新

热门标签：