我的基于矩阵的反向传播算法出了什么问题



我正在学习尼尔森的神经网络和深度学习。为了加深我的理解,Nielsen建议重写他的反向传播算法,采用基于矩阵的方法(由于线性代数库中的优化,据说要快得多(。

目前,我每次的准确率都很低/波动在9-10%之间。通常情况下,我会继续研究我的理解,但我已经用这个算法工作了3天的大部分时间,我觉得我对反投影背后的数学处理得很好。无论如何,我仍然会产生平庸的准确性结果,所以任何见解都将不胜感激!!!

我使用的是MNIST手写数字数据库。


neural_net_batch.py

神经网络函数(此处为反向运算(

"""
neural_net_batch.py
neural_net.py modified to use matrix operations
"""
# Libs
import random
import numpy as np
# Neural Network
class Network(object):
def __init__(self, sizes):
self.num_layers = len(sizes)                                                    # Number of layers in network
self.sizes = sizes                                                              # Number of neurons in each layer
self.biases = [np.random.randn(y, 1) for y in sizes[1:]]                        # Bias vector, 1 bias for each neuron in each layer, except input neurons
self.weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]   # Weight matrix
# Feed Forward Function
# Returns netowrk output for input a
def feedforward(self, a):
for b, w in zip(self.biases, self.weights):              # a’ = σ(wa + b)
a = sigmoid(np.dot(w, a)+b)
return a
# Stochastic Gradient Descent
def SGD(self, training_set, epochs, m, eta, test_data):
if test_data: n_test = len(test_data)
n = len(training_set)
# Epoch loop
for j in range(epochs):
# Shuffle training data & parcel out mini batches
random.shuffle(training_set)
mini_batches = [training_set[k:k+m] for k in range(0, n, m)]
# Pass mini batches one by one to be updated
for mini_batch in mini_batches:
self.update_mini_batch(mini_batch, eta)
# End of Epoch (optional epoch testing)
if test_data:
evaluation = self.evaluate(test_data)
print("Epoch %6i: %5i / %5i" % (j, evaluation, n_test))
else:
print("Epoch %5i complete" % (j))

# Update Mini Batch (Matrix approach)
def update_mini_batch(self, mini_batch, eta):
m = len(mini_batch)
nabla_b = []
nabla_w = []
# Build activation & answer matrices
x = np.asarray([_x.ravel() for _x,_y in mini_batch])    # 10x784 where each row is an input vector
y = np.asarray([_y.ravel() for _x,_y in mini_batch])    # 10x10 where each row is an desired output vector
nabla_b, nabla_w = self.backprop(x, y)      # Feed matrices into backpropagation
# Train Biases & weights
self.biases = [b-(eta/m)*nb for b, nb in zip(self.biases, nabla_b)]
self.weights = [w-(eta/m)*nw for w, nw in zip(self.weights, nabla_w)]

def backprop(self, x, y):
# Gradient arrays
nabla_b = [0 for i in self.biases]
nabla_w = [0 for i in self.weights]
w = self.weights
# Vars
m = len(x)      # Mini batch size
a = x           # Activation matrix temp variable
a_s = [x]       # Activation matrix record
z_s = []        # Weighted Activation matrix record
special_b = []  # Special bias matrix to facilitate matrix operations
# Build special bias matrix (repeating biases for each example)
for j in range(len(self.biases)):
special_b.append([])
for k in range(m):
special_b[j].append(self.biases[j].flatten())
special_b[j] = np.asarray(special_b[j])
# Forward pass
# Starting at the input layer move through each layer
for l in range(len(self.sizes)-1):
z = a @ w[l].transpose() + special_b[l]
z_s.append(z)
a = sigmoid(z)
a_s.append(a)
# Backward pass
delta = cost_derivative(a_s[-1], y) * sigmoid_prime(z_s[-1])
nabla_b[-1] = delta
nabla_w[-1] = delta @ a_s[-2]
for n in range(2, self.num_layers):
z = z_s[-n]
sp = sigmoid_prime(z)
delta = self.weights[-n+1].transpose() @ delta * sp.transpose()
nabla_b[-n] = delta
nabla_w[-n] = delta @ a_s[-n-1]
# Create bias vectors by summing bias columns elementwise
for i in range(len(nabla_b)):
temp = []
for j in nabla_b[i]:
temp.append(sum(j))
nabla_b[i] = np.asarray(temp).reshape(-1,1)
return [nabla_b, nabla_w]
def evaluate(self, test_data):
test_results = [(np.argmax(self.feedforward(t[0])), t[1]) for t in test_data]
return sum(int(x==y) for (x, y) in test_results)
# Cost Derivative Function
# Returns the vector of partial derivatives C_x, a for the output activations y
def cost_derivative(output_activations, y):
return(output_activations-y)
# Sigmoid Function
def sigmoid(z):
return 1.0/(1.0+np.exp(-z))
# Sigmoid Prime (Derivative) Function
def sigmoid_prime(z):
return sigmoid(z)*(1-sigmoid(z))

MNIST_TEST.py

测试脚本

import mnist_data
import neural_net_batch as nn
# Data Sets
training_data, validation_data, test_data = mnist_data.load_data_wrapper()
training_data = list(training_data)
validation_data = list(validation_data)
test_data = list(test_data)
# Network
net = nn.Network([784, 30, 10])
# Perform Stochastic Gradient Descent using MNIST training & test data,
# 30 epochs, mini_batch size of 10, and learning rate of 3.0
net.SGD(list(training_data), 30, 10, 3.0, test_data=test_data)

一个非常有用的Reddit(u/xdaimon(帮助我(在Reddit上(得到了以下答案:

你的后向传球应该是

# Backward pass
delta = cost_derivative(a_s[-1], y) * sigmoid_prime(z_s[-1])
nabla_b[-1] = delta.T
nabla_w[-1] = delta.T @ a_s[-2]
for n in range(2, self.num_layers):
z = z_s[-n]
sp = sigmoid_prime(z)
delta = delta @ self.weights[-n+1] * sp
nabla_b[-n] = delta.T
nabla_w[-n] = delta.T @ a_s[-n-1]

找到这个bug的一种方法是记住应该有在计算nablaw的乘积中的某个位置转置。

如果你感兴趣的话,转置会出现在矩阵中由于AB与外部A的列和B的行的乘积。在这种情况下,A=Δ并且B=a_s[-n-1],因此外积在delta和a_s[-n-1]的行。总和中的每个项都是nabla_w批次中的单个元素,这正是我们想要的。如果迷你包的尺寸是1,你可以很容易地看到delta.T@a_s[-n-1]只是Δ矢量和激活矢量的外积。

测试表明,网络不仅再次准确,而且存在预期的加速。

相关内容

  • 没有找到相关文章

最新更新