线性回归-图像

我正在研究python中的机器学习。我一直在使用以下示例(http://scikit-learn.org/stable/auto_examples/plot_multioutput_face_completion.html#example-plot-multioutput-face-completion-py)与下面的代码示例。

我想测试/验证我对线性回归内部工作的理解。其目的是通过观察图片已知的上半部分来预测缺失的下半部分。最初有300张64*64的图像(4096像素)。自变量X是一个300*2048矩阵(300张图片，2048像素(图片的上半部分))。因变量也是一个300*2048矩阵(图片的下半部分)。看起来系数矩阵是一个2048*2048矩阵。我的理解对吗?

对y的单个像素(例如图片1，大多数左上角像素)的预测是通过将图片1上半部分的所有2048个像素乘以回归系数集来执行的-因此，通过考虑该特定图像的所有2048个像素来估计下半部分的每个缺失像素?
回归系数是像素相关的(每个y像素有不同的2048个回归系数集)，并且这些系数是通过在300张可用图像的相同像素位置上找到特定像素位置的OLS拟合来估计的?

我可能很好地被矩阵混淆-所以请纠正我，如果我错了。多谢。W

print(__doc__)
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_olivetti_faces
from sklearn.utils.validation import check_random_state
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import RidgeCV
# Load the faces datasets
data = fetch_olivetti_faces()
targets = data.target
data = data.images.reshape((len(data.images), -1))
train = data[targets < 30]
test = data[targets >= 30]  # Test on independent people
# Test on a subset of people
n_faces = 5
rng = check_random_state(4)
face_ids = rng.randint(test.shape[0], size=(n_faces, ))
test = test[face_ids, :]
n_pixels = data.shape[1]
X_train = train[:, :np.ceil(0.5 * n_pixels)]  # Upper half of the faces
y_train = train[:, np.floor(0.5 * n_pixels):]  # Lower half of the faces
X_test = test[:, :np.ceil(0.5 * n_pixels)]
y_test = test[:, np.floor(0.5 * n_pixels):]
# Fit estimators
ESTIMATORS = {
    "Extra trees": ExtraTreesRegressor(n_estimators=10, max_features=32,
                                       random_state=0),
    "K-nn": KNeighborsRegressor(),
    "Linear regression": LinearRegression(),
    "Ridge": RidgeCV(),
}
y_test_predict = dict()
for name, estimator in ESTIMATORS.items():
    estimator.fit(X_train, y_train)
    y_test_predict[name] = estimator.predict(X_test)

你说得对。

每个图像有4096个像素。测试集中的每个输出像素是该像素的训练系数与来自测试集中的2048个输入像素的线性组合。

如果您查看sklearn线性回归文档，您将看到其中的多目标回归系数的形状为(n_targets, n_features)(2048个目标，2048个特征)

In [24]: ESTIMATORS['Linear regression'].coef_.shape
Out[24]: (2048, 2048)

在引擎盖下，它调用了scipy. linear。因此，重要的是要注意系数之间没有"信息共享"，从某种意义上说，每个输出都是所有2048个输入像素的单独线性组合。

相关内容

最新更新

热门标签：