sklearn和numpy中的PCA是不同的

我误解了什么吗?这是我的代码

使用sklearn

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import decomposition
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
pca = decomposition.PCA(n_components=3)
x = np.array([
        [0.387,4878, 5.42],
        [0.723,12104,5.25],
        [1,12756,5.52],
        [1.524,6787,3.94],
    ])
pca.fit_transform(x)

输出:

array([[ -4.25324997e+03,  -8.41288672e-01,  -8.37858943e-03],
   [  2.97275001e+03,  -1.25977271e-01,   1.82476780e-01],
   [  3.62475003e+03,  -1.56843494e-01,  -1.65224286e-01],
   [ -2.34425007e+03,   1.12410944e+00,  -8.87390454e-03]])

使用numpy方法

x_std = StandardScaler().fit_transform(x)
cov = np.cov(X.T)
ev , eig = np.linalg.eig(cov)
a = eig.dot(x_std.T)

array([[ 1.38252552, -1.25240764,  0.2133338 ],
       [-0.53279935, -0.44541231, -0.77988021],
       [-0.45230635,  0.21983192, -1.23796328],
       [-0.39741982,  1.47798804,  1.80450969]])

我保留了所有3个组件，但它似乎不允许我保留我的原始数据。

我可以知道为什么吗?

不要使用StandardScaler。相反，只需从x中减去每列的平均值:

In [92]: xm = x - x.mean(axis=0)
In [93]: cov = np.cov(xm.T)
In [94]: evals, evecs = np.linalg.eig(cov)
In [95]: xm.dot(evecs)
Out[95]: 
array([[ -4.2532e+03,  -8.3786e-03,  -8.4129e-01],
       [  2.9728e+03,   1.8248e-01,  -1.2598e-01],
       [  3.6248e+03,  -1.6522e-01,  -1.5684e-01],
       [ -2.3443e+03,  -8.8739e-03,   1.1241e+00]])

最后一个结果包含与sklearn结果相同的信息，但列的顺序不同。

相关内容

最新更新

热门标签：