渐变体面不会收敛,即使是在大的阿尔法值



我正在尝试实现梯度体面,但它没有收敛。我想不通这个问题。当我使用for循环实现梯度体面时,它收敛了。当我切换到使用矢量化方法时,它停止了收敛,即使在很大的alpha值上也是如此。以下是我用来执行梯度下降的函数。

def GenerateData():
ds = pd.read_csv('./housing.csv', )
ds = ds.dropna()
ds = ds.iloc[:400, :]
X = ds['median_income'].copy()
y = ds['median_house_value'].copy()
X = X.to_numpy()
y = y.to_numpy()
return X, y
#Non-Vectorized Approach
def average(X, y, W, b):
N = X.shape[0]
t_err = 0.0
for i in range(N):
t_err += (y[i] - (W * X[i] + b)) ** 2

return t_err/ float(N)
def Grad_descent(X, y, W, b, alpha=0.01):
N = X.shape[0]
df_dw = 0
df_db = 0
for i in range(N):
df_dw += -2 * X[i] * (y[i] - ( W * X[i] + b))
df_db += -2*(y[i] - (W * X[i] + b))

W = W - (1/float(N)) * df_dw * alpha
b = b - (1/float(N)) * df_db * alpha
return W, b
# Vectorized approach
def average(X, y, W, b):
M = X.shape[0]
t_err = 0.0
t_err = np.sum(np.power(y - (X @ W + b), 2))
return t_err/ (2 * float(M))
def Grad_descent(X, y, alpha=0.01, epcs=10):
N = X.shape[1]
M = X.shape[0]
W = np.zeros(N)
b = np.ones(M)

df_dW = 0
df_db = 0
for e in range(epcs):
df_dW = np.transpose(X) @ ((X @ W + b) - y)
df_db = (X @ W + b) - y
W = W - (1/(2 *float(M))) * df_dW * alpha
b = b - (1/(2 *float(M))) * df_db * alpha
print('Epoch:', e, 'loss: ', average(X, y, W, b))
return W, b

我不明白为什么在使用矢量化方法时梯度体面会发散。

x形状是(400, 8),y形状是(400, 1)数据取自房价预测数据集外壳Ds

我对alpha[学习率]有点犹豫,发现0.001太高了我把它改成了1e-10,它开始收敛了!我还重构并更改了代码。

def average(X, y, W):
M = X.shape[0]
t_err = 0.0
t_err = np.sum(np.power((X @ W) - y, 2))
return t_err/ (2 * float(M))
def Grad_descent(X, y, alpha=0.1, epcs=10):
N = X.shape[1]
M = X.shape[0]
W = np.random.randn(N, 1)
for e in range(epcs):
df_dW = (2/M) * X.T @ (X @ W - y)
W = W - alpha * df_dW
print('Epoch:', e, ' loss:', average(X, y, W))

之前我单独计算偏差项,但现在我将偏差列与X矩阵合并,所以权重向量的第一个元素是用于偏差项的。我想学习率是梯度体面不收敛的问题,而且代价函数的结果真的很大,这导致了从全局最小值的更高跳跃。

最新更新