神经网络中的 SciPy 优化警告



我在NeuralNetwork中使用 SciPyfmin_bfgs()优化收到下一个警告。一切都应该清晰简单,遵循Backpropagation算法。

1 前馈训练示例。
2 计算每个单位的误差项。
3 累积梯度(对于第一个示例,我跳过了正则化项(。

Starting Loss: 7.26524579601
Check gradient: 2.02493576268
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: 5.741300
Iterations: 3
Function evaluations: 104
Gradient evaluations: 92
Trained Loss: 5.74130012926

我刚刚用 MATLAB 完成了同样的任务,它成功地执行了fmin个优化函数,但无法理解我在 Python 实现中遗漏了什么。如您所见,即使scipy.optimize.check_grad返回的值也太大了。

def feed_forward(x, theta1, theta2):
hidden_dot = np.dot(add_bias(x), np.transpose(theta1))
hidden_p = sigmoid(hidden_dot)
p = sigmoid(np.dot(add_bias(hidden_p), np.transpose(theta2)))
return hidden_dot, hidden_p, p

def cost(thetas, x, y, hidden, lam):
theta1, theta2 = get_theta_from(thetas, x, y, hidden)
_, _, p = feed_forward(x, theta1, theta2)
# regularization = (lam / (len(x) * 2)) * (
#     np.sum(np.square(np.delete(theta1, 0, 1)))
#     + np.sum(np.square(np.delete(theta2, 0, 1))))
complete = -1 * np.dot(np.transpose(y), np.log(p)) 
- np.dot(np.transpose(1 - y), np.log(1 - p))
return np.sum(complete) / len(x)  # + regularization

def vector(z):
# noinspection PyUnresolvedReferences
return np.reshape(z, (np.shape(z)[0], 1))

def gradient(thetas, x, y, hidden, lam):
theta1, theta2 = get_theta_from(thetas, x, y, hidden)
hidden_dot, hidden_p, p = feed_forward(x, theta1, theta2)
error_o = p - y
error_h = np.multiply(np.dot(
error_o, np.delete(theta2, 0, 1)), sigmoid_gradient(hidden_dot))
x = add_bias(x)
hidden_p = add_bias(hidden_p)
theta1_grad, theta2_grad = 
np.zeros(theta1.shape[::-1]), np.zeros(theta2.shape[::-1])
records = y.shape[0]
for i in range(records):
theta1_grad = theta1_grad + np.dot(
vector(x[i]), np.transpose(vector(error_h[i])))
theta2_grad = theta2_grad + np.dot(
vector(hidden_p[i]), np.transpose(vector(error_o[i])))
theta1_grad = np.transpose(
theta1_grad / records)  # + (lam / records * theta1)
theta2_grad = np.transpose(
theta2_grad / records)  # + (lam / records * theta2)
return np.append(theta1_grad, theta2_grad)

def get_theta_shapes(x, y, hidden):
return (hidden, x.shape[1] + 1), 
(y.shape[1], hidden + 1)

def get_theta_from(thetas, x, y, hidden):
t1_s, t2_s = get_theta_shapes(x, y, hidden)
split = t1_s[0] * t1_s[1]
theta1 = np.reshape(thetas[:split], t1_s)
theta2 = np.reshape(thetas[split:], t2_s)
return theta1, theta2

def train(x, y, hidden_size, lam):
y = get_binary_y(y)
t1_s, t2_s = get_theta_shapes(x, y, hidden_size)
thetas = np.append(
rand_init(t1_s[0], t1_s[1]),
rand_init(t2_s[0], t2_s[1]))
initial_cost = cost(thetas, x, y, hidden_size, lam)
print("Starting Loss: " + str(initial_cost))
check_grad1 = scipy.optimize.check_grad(
cost, gradient, thetas, x, y, hidden_size, lam)
print("Check gradient: " + str(check_grad1))
trained_theta = scipy.optimize.fmin_bfgs(
cost, thetas, fprime=gradient, args=(x, y, hidden_size, lam))
print("Trained Loss: " +
str(cost(trained_theta, x, y, hidden_size, lam)))

只是想知道,你为什么跳过正则化步骤?您是否尝试过使用正则化运行程序?

同样,计算存在几个问题,以解决所有警告并使Scipy优化运行成功,并且与Matlabfminc优化函数相同。 (工作Python示例,您可以在Github上找到(

1.将计算成本更新为正确的计算成本。在成本函数中按要素乘法。正确的成本解决方案将是(带有正则化术语(:

def cost(thetas, x, y, hidden, lam):
theta1, theta2 = get_theta_from(thetas, x, y, hidden)
_, _, p = feed_forward(x, theta1, theta2)
regularization = (lam / (len(x) * 2)) * (
np.sum(np.square(np.delete(theta1, 0, 1)))
+ np.sum(np.square(np.delete(theta2, 0, 1))))
complete = np.nan_to_num(np.multiply((-y), np.log(
p)) - np.multiply((1 - y), np.log(1 - p)))
avg = np.sum(complete) / len(x)
return avg + regularization

2.执行此操作后,我们将从Scipy收到优化Theta项中的nan值 对于这种情况,我们执行上述np.nan_to_num注意!Matlabfminc适用于意外数字。

3.应用正确的正则化,不要忘记删除偏差值的正则化。正确的渐变函数应如下所示:

def gradient(thetas, x, y, hidden, lam):
theta1, theta2 = get_theta_from(thetas, x, y, hidden)
hidden_dot, hidden_p, p = feed_forward(x, theta1, theta2)
error_o = p - y
error_h = np.multiply(np.dot(
error_o, theta2),
sigmoid_gradient(add_bias(hidden_dot)))
x = add_bias(x)
error_h = np.delete(error_h, 0, 1)
theta1_grad, theta2_grad = 
np.zeros(theta1.shape[::-1]), np.zeros(theta2.shape[::-1])
records = y.shape[0]
for i in range(records):
theta1_grad = theta1_grad + np.dot(
vector(x[i]), np.transpose(vector(error_h[i])))
theta2_grad = theta2_grad + np.dot(
vector(hidden_p[i]), np.transpose(vector(error_o[i])))
reg_theta1 = theta1.copy()
reg_theta1[:, 0] = 0
theta1_grad = np.transpose(
theta1_grad / records) + ((lam / records) * reg_theta1)
reg_theta2 = theta2.copy()
reg_theta2[:, 0] = 0
theta2_grad = np.transpose(
theta2_grad / records) + ((lam / records) * reg_theta2)
return np.append(
theta1_grad, theta2_grad)

最新更新