梯度下降算法导致 Nan's



我正在尝试在Python中实现梯度下降算法,但我很难完成它。我有一个稀疏矩阵 (X_sparse(,我试图在其中找到两个矩阵(p 和 q(,它们的乘积非常接近稀疏矩阵的非零条目上的条目。因此,为了找到组成矩阵 p 和 q 的向量 p[i] 和 q[j],我应用了梯度下降。

代码如下:

def Gradient_Descent(X_sparse):
n_factors = 10
n_steps = 10
alpha = 0.01
#initializing the vectors randomly:
p = np.random.normal(0, .1, (X_sparse.shape[0], n_factors))
q = np.random.normal(0, .1, (X_sparse.shape[1], n_factors))
for k in range(0,n_steps):
for (i,j) in zip(X_sparse.nonzero()[0],X_sparse.nonzero()[1]):
err = X_sparse[i,j] - np.dot(p[i],q[j])
p[i] = p[i] + alpha*q[j]*err
q[j] = q[j] + alpha*p[i]*err
return (p,q)    

调用时,此函数同时返回 p 和 q:

array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]])

(至少 p 和 q 具有每个形状的正确(。有人可以帮我弄清楚为什么吗?

编辑:我也收到这些警告

C:UsersBernardanaconda3libsite-packagesipykernel_launcher.py:14: RuntimeWarning: overflow encountered in multiply
C:UsersBernardanaconda3libsite-packagesipykernel_launcher.py:13: RuntimeWarning: overflow encountered in multiply del sys.path[0]
C:UsersBernardanaconda3libsite-packagesipykernel_launcher.py:13: RuntimeWarning: invalid value encountered in add del sys.path[0]

你应该在 np.dot(( 中将整行和整列相乘,而不仅仅是单个元素。并且还要训练更长的迭代时间。

您的错误是因为X_sparse中的数字很大,这会导致梯度爆炸问题。因此,您应该规范化数据。

使用这个:

def Gradient_Descent(X):
X_sparse = X/np.max(X)           # normalizing values
print(X_sparse)
n_factors = 10
n_steps = 10000                  # optimized
alpha = 0.1                      # optimized

#initializing the vectors randomly:
p = np.random.normal(0, .01, (X_sparse.shape[0], n_factors))
q = np.random.normal(0, .01, (n_factors, X_sparse.shape[1]))      # changed so as to follow matrix multiplication rule

for k in range(0,n_steps):
for (i,j) in zip(X_sparse.nonzero()[0],X_sparse.nonzero()[1]):
err = X_sparse[i,j] - np.dot(p[i, :],q[:, j])              # multiply whole row and column
p[i, :] = p[i, :] + alpha*q[:, j]*err                      # update whole row and column
q[:, j] = q[:, j] + alpha*p[i, :]*err                      # update whole row and column
print(np.dot(p, q)) 
p = p*np.sqrt(np.max(X))                         # matrix multiplication rule for normalized values
q = q*np.sqrt(np.max(X))                         # matrix multiplication rule for normalized values
return (p,q)   

因为在开始时,我们要除以 np.max(X(,所以我们应该将 p 和 q 乘以 np.sqrt(np.max(X((。

它对我有用:

Input: 
X = np.array([[12, 0],
[16, 11],
[0, 230]])
p, q = Gradient_Descent(X)
print(np.dot(p, q))
Output:
array([[ 12.        ,  16.12096025],
[ 16.        ,  11.        ],
[ 26.17067276, 230.        ]])

最新更新