我有一个关于Scikit-Learn的PCA变换方法的问题。该代码在此处找到 - 向下滚动以找到transform()
方法。
他们在这个简单的示例中显示了该过程 - 该过程是首先拟合然后转换:
pca.fit(X) #step 1: fit()
X_transformed = fast_dot(X, self.components_.T) #step 2: transform()
我试图按以下方式手动执行此操作:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.utils.extmath import fast_dot
iris = load_iris()
X = iris.data
y = iris.target
pca = PCA(n_components=3)
pca.fit(X)
Xm = X.mean(axis=1)
print pca.transform(X)[:5,:] #Method 1 - expected
X = X - Xm[None].T # or can use X = X - Xm[:, np.newaxis]
print fast_dot(X,pca.components_.T)[:5,:] #Method 2 - manual
预期:
[[-2.68420713 -0.32660731 0.02151184]
[-2.71539062 0.16955685 0.20352143]
[-2.88981954 0.13734561 -0.02470924]
[-2.7464372 0.31112432 -0.03767198]
[-2.72859298 -0.33392456 -0.0962297 ]]
手册
[[-0.98444292 -2.74509617 2.28864171]
[-0.75404746 -2.44769323 2.35917528]
[-0.89110797 -2.50829893 2.11501947]
[-0.74772562 -2.33452022 2.10205674]
[-1.02882877 -2.75241342 2.17090017]]
您可以看到,两个结果是不同的。transform()
方法中的某个地方是否缺少一步?
我不是PCA的出色专家,但是通过查看Sklearn源代码,我发现了您的问题 - 您沿错误的轴均值。
这是解决方案:
Xm = X.mean(axis=0) # Axis 0 instead of 1
print pca.transform(X)[:5,:] #Method 1 - expected
X = X - Xm # No need for transpose now
print fast_dot(X,pca.components_.T)[:5,:] #Method 2 - manual
结果:
[[-2.68420713 0.32660731 -0.02151184]
[-2.71539062 -0.16955685 -0.20352143]
[-2.88981954 -0.13734561 0.02470924]
[-2.7464372 -0.31112432 0.03767198]
[-2.72859298 0.33392456 0.0962297 ]]
[[-2.68420713 0.32660731 -0.02151184]
[-2.71539062 -0.16955685 -0.20352143]
[-2.88981954 -0.13734561 0.02470924]
[-2.7464372 -0.31112432 0.03767198]
[-2.72859298 0.33392456 0.0962297 ]]