我正在尝试写自己的神经网络作为学习练习。具体来说,我正在尝试创建一个神经网络来识别手写数字。我正在使用Sklearn的Digits数据集,但我自己写了神经网络。
简单的测试成功,即,门或门,所以我有信心成功实施了后传播,但是我发现在培训后,该网络在使用一个工作时仍然非常差8x8手写数字的像素图像。我当前有64个输入(8x8映像)和10个输出(每个数字一个),每个大小4的隐藏层都有2个隐藏层。,由于网络通常会达到[0.1、0.1、0.1 ...]的激活(即0.0 * 9 1.0 * 1)的激活。
可能的想法:
1)多个输出会引起问题吗?
2)是否需要更好的错误函数?
3)我只需要以较小的学习率来训练系统更长的时间?
图像显示迭代上的错误
图像显示1(即输出的预测〜[0,1,1,0,0,0,0,0,0,0])
有人面临与此相似的问题吗?还是可以告知我可能出错的地方?感谢您有耐心的耐用性,我是否没有找到它!下面的代码:
编辑: CharleSreid1和Jdehesa都是正确的,这是我的网络体系结构太简单而无法处理此任务。更具体地说,我有2层的4个神经元,每个神经元试图处理64个输入。将我的隐藏层更改为100层神经元的3层,每个神经元可以达到90%的精度得分(假设输出> 0.7被认为是阳性结果)。
# Import our dependencies
import numpy as np
from sklearn import datasets
class Neural_Network():
#Initalising function
def __init__(self, input_size, output_size, niteration = 100000):
np.random.seed(1)
self.niteration = niteration
self.layer_sizes = np.array([input_size, output_size])
self.weights = list()
self.error = np.array([])
# initialise random weights
self._recreate_weights()
def _recreate_weights(self):
# Recreate the weights after adding a hidden layer
self.weights = list()
for i in np.arange(len(self.layer_sizes) - 1):
weights = np.random.rand(self.layer_sizes[i], self.layer_sizes[i+1]) * 2 - 1
self.weights.append(weights)
self.momentum = [i * 0 for i in self.weights]
def add_hidden_layer(self,size):
# Add a new hidden layer to our neural network
self.layer_sizes = np.insert(self.layer_sizes, -1, size)
self._recreate_weights()
def _sigmoid(self, x, deriv=False):
if deriv:
return self._sigmoid(x, deriv=False)*(1-self._sigmoid(x, deriv=False))
else:
return 1.0/(1+np.exp(-x))
def predict(self, input_single, deriv=False, layer_output = False):
data_current_layer = input_single
output_list = list()
output_list.append(np.array([data_current_layer]))
for i in np.arange(len(self.layer_sizes) - 1):
data_current_layer = self._sigmoid(np.dot(data_current_layer, self.weights[i]), deriv)
output_list.append(np.array([data_current_layer]))
return(output_list)
def train2(self, input_training_data, input_training_labels):
for iterations in np.arange(self.niteration):
# Loop over all training sets niteration times
updates = [i * 0 for i in network.weights] # Used for storing the update to the weights
mean_error = np.array([]) # used for calculating the mean error
for i in np.arange(len(input_training_data)): # For each training example
activations = list() # Store all my activations in a list
activations.append(np.array([input_training_data[i]]))
for j in np.arange(len(self.layer_sizes) - 1):
# Calculate all the activations for every layer
z = np.dot(activations[-1], self.weights[j])
a = self._sigmoid(z, deriv = False)
activations.append(a)
error = list()
error.append(a[-1] - np.array([input_training_labels[i]]))
for j in np.arange(len(self.layer_sizes) - 2):
# Calculate the error term for each layer
j2 = (-1 * j) - 1
j3 = j2 - 1
d = np.dot(error[j], self.weights[j2].T) * activations[j3] * (1 - activations[j3])
error.append(d)
for j in np.arange(len(self.layer_sizes) - 1):
# calculate the gradient for the error with respect to the weights
j2 = (-1 * j) - 1
updates[j] += np.dot(activations[j].T, error[j2])
mean_error = np.append(mean_error, np.sum(np.abs(error[0])))
updates = [0.001*i/len(input_training_data) for i in updates] # Add in a learning rate
self.error = np.append(self.error,np.mean(mean_error))
for i in np.arange(len(self.weights)):
# update using a momentum term
self.momentum[i] -= updates[i]
self.weights[i] += self.momentum[i]
self.momentum[i] *= 0.9
if np.mod(iterations, 1000) == 0:
# Visually keep track of the error
print(iterations, self.error[-1])
# Main Loop
# Read in the dataset and divide into a training and test set
data = datasets.load_digits()
images = data.images
labels = data.target
targets = data.target_names
training_images = images[:int(len(labels*0.8))]
training_labels = labels[:int(len(labels*0.8))]
training_images = images[:10]
training_labels = labels[:10]
test_images = images[int(len(labels*0.8)):]
test_labels = labels[int(len(labels*0.8)):]
# Flatten the training and test images using ravel. CAN PROBABLY DO THIS BEFORE DIVIDING THEM UP.
training_images_list = list()
for i in training_images:
training_images_list.append(np.ravel(i))
test_images_list = list()
for i in test_images:
test_images_list.append(np.ravel(i))
# Change the training and test labels into a more usable format.
training_labels_temp=np.zeros([np.size(training_labels), 10])
for i in np.arange(np.size(training_labels)):
training_labels_temp[i, training_labels[i]] = 1
training_labels = training_labels_temp
test_labels_temp=np.zeros([np.size(test_labels), 10])
for i in np.arange(np.size(test_labels)):
test_labels_temp[i, test_labels[i]] = 1
test_labels = test_labels_temp
# Build a 3 layered neural network, input - hidden - output
if True:
network = Neural_Network(input_size=64, output_size=10)
network.add_hidden_layer(size=4)
network.add_hidden_layer(size=4)
network.add_hidden_layer(size=4)
# Train the network on our training set
#print(network.weights)
network.train2(input_training_data = training_images_list, input_training_labels = training_labels)
#print(network.weights)
# Calculate the error on our test set
#network.calculate_error(test_set = test_images, test_labels = test_labels)
问题绝对在于您的网络体系结构 - 特别是第一个隐藏层。您将8x8输入输入到带有4个神经元的隐藏层。首先,没有足够的神经元,仅通过四个神经元就可以冲洗64个像素中包含的信息。另一个问题(可能与足够的神经元消失)是,由于您的predict()
函数对DOT产品的使用,每个神经元都完全连接到输入。
识别手写数字的任务固有地与像素的空间配置有关,因此您的网络应利用这些知识。您应该将输入图像的不同部分喂入第一层的不同神经元。这为这些神经元提供了基于图像中像素的排列更强或湿的信号的机会在中心,不太可能是0等)。
概括这个想法是卷积神经网络的含义,以及为什么它们在图像识别任务中如此出色。O'Reilly Publishers还有另一篇不错的文章称为另一个MNIST教程,这确实不是另一个教程,而是显示一些非常有用的可视化来理解发生了什么。
漫长而短的是这样:和/或是一个非常简单的任务,但是您已经跳到了一个非常复杂的任务 - 您的神经网络架构应该具有相应的复杂性跳跃所需的架构。卷积神经网络通常遵循建筑模式:
- 将图像的一部分分开,将不同的部分分配给不同的神经元(卷积层)
- 从图像的不同部分重新组合信息(池层)
- 滤除弱信号(辍学层)
- 将空间信息转换为向量信号(平坦层)
- 创建另一层神经元完全连接到先前层的神经元(密集层)
用于更复杂任务的较大CNN将将这些层组合为较大的嵌套体系结构和子网络。知道要使用的层组合是一种艺术,并且可以进行大量实验(因此GPU的普及 - 使其更快地进行迭代和实验)。但是对于灰度手写数字,您应该通过利用您已经知道的有关手头任务的信息来看到一个很大的改进,即它应该利用空间结构。