我正在尝试在Python中实现梯度下降算法,但我很难完成它。我有一个稀疏矩阵 (X_sparse(,我试图在其中找到两个矩阵(p 和 q(,它们的乘积非常接近稀疏矩阵的非零条目上的条目。因此,为了找到组成矩阵 p 和 q 的向量 p[i] 和 q[j],我应用了梯度下降。
代码如下:
def Gradient_Descent(X_sparse):
n_factors = 10
n_steps = 10
alpha = 0.01
#initializing the vectors randomly:
p = np.random.normal(0, .1, (X_sparse.shape[0], n_factors))
q = np.random.normal(0, .1, (X_sparse.shape[1], n_factors))
for k in range(0,n_steps):
for (i,j) in zip(X_sparse.nonzero()[0],X_sparse.nonzero()[1]):
err = X_sparse[i,j] - np.dot(p[i],q[j])
p[i] = p[i] + alpha*q[j]*err
q[j] = q[j] + alpha*p[i]*err
return (p,q)
调用时,此函数同时返回 p 和 q:
array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]])
(至少 p 和 q 具有每个形状的正确(。有人可以帮我弄清楚为什么吗?
编辑:我也收到这些警告
C:UsersBernardanaconda3libsite-packagesipykernel_launcher.py:14: RuntimeWarning: overflow encountered in multiply
C:UsersBernardanaconda3libsite-packagesipykernel_launcher.py:13: RuntimeWarning: overflow encountered in multiply del sys.path[0]
C:UsersBernardanaconda3libsite-packagesipykernel_launcher.py:13: RuntimeWarning: invalid value encountered in add del sys.path[0]
你应该在 np.dot(( 中将整行和整列相乘,而不仅仅是单个元素。并且还要训练更长的迭代时间。
您的错误是因为X_sparse中的数字很大,这会导致梯度爆炸问题。因此,您应该规范化数据。
使用这个:
def Gradient_Descent(X):
X_sparse = X/np.max(X) # normalizing values
print(X_sparse)
n_factors = 10
n_steps = 10000 # optimized
alpha = 0.1 # optimized
#initializing the vectors randomly:
p = np.random.normal(0, .01, (X_sparse.shape[0], n_factors))
q = np.random.normal(0, .01, (n_factors, X_sparse.shape[1])) # changed so as to follow matrix multiplication rule
for k in range(0,n_steps):
for (i,j) in zip(X_sparse.nonzero()[0],X_sparse.nonzero()[1]):
err = X_sparse[i,j] - np.dot(p[i, :],q[:, j]) # multiply whole row and column
p[i, :] = p[i, :] + alpha*q[:, j]*err # update whole row and column
q[:, j] = q[:, j] + alpha*p[i, :]*err # update whole row and column
print(np.dot(p, q))
p = p*np.sqrt(np.max(X)) # matrix multiplication rule for normalized values
q = q*np.sqrt(np.max(X)) # matrix multiplication rule for normalized values
return (p,q)
因为在开始时,我们要除以 np.max(X(,所以我们应该将 p 和 q 乘以 np.sqrt(np.max(X((。
它对我有用:
Input:
X = np.array([[12, 0],
[16, 11],
[0, 230]])
p, q = Gradient_Descent(X)
print(np.dot(p, q))
Output:
array([[ 12. , 16.12096025],
[ 16. , 11. ],
[ 26.17067276, 230. ]])