在简单的感知中正确反向传播



给定简单或门问题:

or_input = np.array([[0,0], [0,1], [1,0], [1,1]])
or_output = np.array([[0,1,1,1]]).T

如果我们训练一个简单的单层感知器(没有反向传播(,我们可以做这样的事情:

import numpy as np
np.random.seed(0)
def sigmoid(x): # Returns values that sums to one.
    return 1 / (1 + np.exp(-x))
def cost(predicted, truth):
    return (truth - predicted)**2
or_input = np.array([[0,0], [0,1], [1,0], [1,1]])
or_output = np.array([[0,1,1,1]]).T
# Define the shape of the weight vector.
num_data, input_dim = or_input.shape
# Define the shape of the output vector. 
output_dim = len(or_output.T)
num_epochs = 50 # No. of times to iterate.
learning_rate = 0.03 # How large a step to take per iteration.
# Lets standardize and call our inputs X and outputs Y
X = or_input
Y = or_output
W = np.random.random((input_dim, output_dim))
for _ in range(num_epochs):
    layer0 = X
    # Forward propagation.
    # Inside the perceptron, Step 2. 
    layer1 = sigmoid(np.dot(X, W))
    # How much did we miss in the predictions?
    cost_error = cost(layer1, Y)
    # update weights
    W +=  - learning_rate * np.dot(layer0.T, cost_error)
# Expected output.
print(Y.tolist())
# On the training data
print([[int(prediction > 0.5)] for prediction in layer1])

[out]:

[[0], [1], [1], [1]]
[[0], [1], [1], [1]]

使用反向流动,要计算d(cost)/d(X)是否正确?

  • 通过乘以成本误差和成本的导数

  • 来计算Layer1错误
  • 然后通过乘以层1误差和Sigmoid

  • 的衍生物来计算Layer1 Delta
  • 然后在输入和layer1 delta之间执行点产品,以获取即d(cost)/d(X)

  • 的差异

然后将d(cost)/d(X)乘以进行梯度下降的学习率的负数。

num_epochs = 0 # No. of times to iterate.
learning_rate = 0.03 # How large a step to take per iteration.
# Lets standardize and call our inputs X and outputs Y
X = or_input
Y = or_output
W = np.random.random((input_dim, output_dim))
for _ in range(num_epochs):
    layer0 = X
    # Forward propagation.
    # Inside the perceptron, Step 2. 
    layer1 = sigmoid(np.dot(X, W))
    # How much did we miss in the predictions?
    cost_error = cost(layer1, Y)
    # Back propagation.
    # multiply how much we missed from the gradient/slope of the cost for our prediction.
    layer1_error = cost_error * cost_derivative(cost_error)
    # multiply how much we missed by the gradient/slope of the sigmoid at the values in layer1
    layer1_delta = layer1_error * sigmoid_derivative(layer1)
    # update weights
    W +=  - learning_rate * np.dot(layer0.T, layer1_delta)

在这种情况下,是否应该在下面使用cost_derivativesigmoid_derivative

看起来像这样。
import numpy as np
np.random.seed(0)
def sigmoid(x): # Returns values that sums to one.
    return 1 / (1 + np.exp(-x))
def sigmoid_derivative(sx):
    # See https://math.stackexchange.com/a/1225116
    return sx * (1 - sx)
def cost(predicted, truth):
    return (truth - predicted)**2
def cost_derivative(y):
    # If the cost is:
    # cost = y - y_hat
    # What's the derivative of d(cost)/d(y)
    # d(cost)/d(y) = 1
    return 2*y

or_input = np.array([[0,0], [0,1], [1,0], [1,1]])
or_output = np.array([[0,1,1,1]]).T
# Define the shape of the weight vector.
num_data, input_dim = or_input.shape
# Define the shape of the output vector. 
output_dim = len(or_output.T)
num_epochs = 5 # No. of times to iterate.
learning_rate = 0.03 # How large a step to take per iteration.
# Lets standardize and call our inputs X and outputs Y
X = or_input
Y = or_output
W = np.random.random((input_dim, output_dim))
for _ in range(num_epochs):
    layer0 = X
    # Forward propagation.
    # Inside the perceptron, Step 2. 
    layer1 = sigmoid(np.dot(X, W))
    # How much did we miss in the predictions?
    cost_error = cost(layer1, Y)
    # Back propagation.
    # multiply how much we missed from the gradient/slope of the cost for our prediction.
    layer1_error = cost_error * cost_derivative(cost_error)
    # multiply how much we missed by the gradient/slope of the sigmoid at the values in layer1
    layer1_delta = layer1_error * sigmoid_derivative(layer1)
    # update weights
    W +=  - learning_rate * np.dot(layer0.T, layer1_delta)
# Expected output.
print(Y.tolist())
# On the training data
print([[int(prediction > 0.5)] for prediction in layer1])

[out]:

[[0], [1], [1], [1]]
[[0], [1], [1], [1]]

btw,给定随机输入种子,即使没有 W和梯度下降或感知者,预测仍然可以是正确的:

import numpy as np
np.random.seed(0)
# Lets standardize and call our inputs X and outputs Y
X = or_input
Y = or_output
W = np.random.random((input_dim, output_dim))
# On the training data
predictions = sigmoid(np.dot(X, W))
[[int(prediction > 0.5)] for prediction in predictions]

您几乎是正确的。在实施中,您将成本定义为错误的平方,这是始终积极的不幸后果。结果,如果您绘制平均值(cost_error(,则每次迭代时都会缓慢升高,并且您的权重逐渐减小。

在您的特殊情况下,您可以拥有任何权重> 0来使其起作用:如果您尝试使用足够的时代实现实现,那么您的权重将变成负面,网络将不再起作用。

您可以在成本函数中删除正方形:

def cost(predicted, truth):
    return (truth - predicted)

现在要更新权重,您需要在错误的"位置"下评估梯度。所以您的需求是:

d_predicted = output_errors * sigmoid_derivative(predicted_output)

接下来,我们更新权重:

W += np.dot(X.T, d_predicted) * learning_rate

带有错误显示的完整代码:

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0)
def sigmoid(x): # Returns values that sums to one.
    return 1 / (1 + np.exp(-x))
def sigmoid_derivative(sx):
    # See https://math.stackexchange.com/a/1225116
    return sx * (1 - sx)
def cost(predicted, truth):
    return (truth - predicted)
or_input = np.array([[0,0], [0,1], [1,0], [1,1]])
or_output = np.array([[0,1,1,1]]).T
# Define the shape of the weight vector.
num_data, input_dim = or_input.shape
# Define the shape of the output vector. 
output_dim = len(or_output.T)
num_epochs = 50 # No. of times to iterate.
learning_rate = 0.1 # How large a step to take per iteration.
# Lets standardize and call our inputs X and outputs Y
X = or_input
Y = or_output
W = np.random.random((input_dim, output_dim))
# W = [[-1],[1]] # you can try to set bad weights to see the training process
error_list = []
for _ in range(num_epochs):
    layer0 = X
    # Forward propagation.
    layer1 = sigmoid(np.dot(X, W))
    # How much did we miss in the predictions?
    cost_error = cost(layer1, Y)
    error_list.append(np.mean(cost_error)) # save the loss to plot later
    # Back propagation.
    # eval the gradient :
    d_predicted = cost_error * sigmoid_derivative(cost_error)
    # update weights
    W = W + np.dot(X.T, d_predicted) * learning_rate

# Expected output.
print(Y.tolist())
# On the training data
print([[int(prediction > 0.5)] for prediction in layer1])
# plot error curve : 
plt.plot(range(num_epochs), loss_list, '+b')
plt.xlabel('Epoch')
plt.ylabel('mean error')

我还添加了一条线以手动设置初始权重,以查看网络如何学习

最新更新