有人可以启发我为什么这段代码中的损失函数不正确吗?



因此,此代码构建了神经网络,这些神经网络具有对MNIST数据集上正确分类数字的目标。

网络学习的技术不是反向传播,而是使用(或至少尝试使用)一种基于达尔文进化原理的称为神经进化的技术,以构建神经网络群体,评估它们,并充分利用它们来产生新一代神经网络等。

在此代码中,我创建了一个包含 10 个神经网络的群体,这些神经网络在交叉熵损失函数上进行评估。 我为下一代保留其中最好的 5 个,并用从已经保留的 5 个创建的"子"网络替换其他 5 个。

我的问题是,我没有看到相同的网络(保留的网络)在一代到另一代之间保持相同的损失值。

例如,如果评估了 5 个最佳网络,则每个网络都显示一定的损失值,然后将它们保留给下一代,并重新评估所有人口。但是在那里我找不到与以前相同的损失值。 由于总体呈现在列表对象中,并按相同的顺序保存,如果 5 个新创建的其他"子"网络更好,那么它们应该替换以前保留的 5 个网络的值,但它表明,"子"网络往往具有最差的值,并且仍然显然为保留的网络计算的损失值因代而异。

简而言之:对于相同的网络对象,从一代到另一代的损失值并不相同,同时采用完全相同的数据和相同的参数。

如果有人有时间查看它并深入了解代码中的问题。在计算损失时,这可能是某个地方的代码问题,但我无法弄清楚。

PS:我也看到并且有一些代码显示它,特定网络的损失值的计算变化非常小(小数),从一个计算到另一个计算,这是我已经不明白的事情,但它仍然无法解释损失值计算的巨大差异一代到另一代。

所以这是我代码的三个模块:

第一个模块:

"""
Utility used by the Network class to actually train.
Based on:
https://github.com/fchollet/keras/blob/master/examples/mnist_mlp.py
"""
#from keras.datasets import cifar10
from sklearn.datasets import fetch_mldata
from sklearn.metrics import classification_report, confusion_matrix
import numpy as np


def get_mnist():
"""Get MNIST dataset through scikit-learn and pre-process data to make it usable by our classifiers"""
mnist = fetch_mldata('MNIST original')
#X as images - array (70000, 784) corresponding respectively to the number of examples and the number of pixels per image
#y as labels  - array (70000,) corresponding to the number of examples, each value is a digit from 0 to 9
X, y = mnist["data"], mnist["target"]
#Normalize pixels of images
X = X / 255

digits = 10
examples = y.shape[0]
# Reshape y as array (1, 70000)
y = y.reshape(1, examples)
"""Create a label array of shape (10, 70000) and replace each digit value from 0 to 9 by value of 1.
Rest of the array composed of zeros.
Allow us to design it the same way as our networks's output array will be, with the maximum value corresponding to what the digit is."""
Y_new = np.eye(digits)[y.astype('int32')]
Y_new = Y_new.T.reshape(digits, examples)
m = 60000
m_test = X.shape[0] - m
#Get images for train set and test set, transposing in array of shape (784, 60000) and (784, 10000)
X_train, X_test = X[:m].T, X[m:].T
#Get labels for train set and test set
Y_train, Y_test = Y_new[:,:m], Y_new[:,m:]
#Shuffle train set to randomize it as it is organized from digits 0 to 9
shuffle_index = np.random.permutation(m)
X_train, Y_train = X_train[:, shuffle_index], Y_train[:, shuffle_index]

return (X_train, X_test, Y_train, Y_test)

def sigmoid(z):
#sigmoid activation function
s = 1 / (1 + np.exp(-z))
return s

def compute_multiclass_loss(Y, Y_hat):
"""Fitness function: Categorical cross-entropy cost function, used in the case of multi-class outputs."""
L_sum = np.sum(np.multiply(Y, np.log(10**(-15)+Y_hat)))
m = Y.shape[1]
L = -(1/m) * L_sum
return L
def compute_accuracy(Y, Y_hat):
"""Fitness function: a different way to compute cost function, with accuracy.
Tested, seems to work but the formula has to be verified as it may contain potential approximations"""
correct = 0
uncorrect = 0
argmax_Y = np.argmax(Y, axis=0)
argmax_Y_hat = np.argmax(Y_hat, axis=0)
for i in range(60000):
if argmax_Y[i] == argmax_Y_hat[i]:
correct += 1
else:
uncorrect += 1
accuracy = correct / (correct + uncorrect)
return accuracy


def neural_network_evaluator(input_layer_to_hidden_layer, hidden_layer_to_output_layer, b1, b2):
"""Function used to: 1/ forwardpropagate the input in a particular neural network
2/ Generate outputs
3/ Determine the cost of fitness function for this network"""
X_train, X_test, Y_train, Y_test = get_mnist()
#Feedforward for training neural network on training set
Z1 = np.matmul(input_layer_to_hidden_layer,X_train) + b1
A1 = sigmoid(Z1)
Z2 = np.matmul(hidden_layer_to_output_layer,A1) + b2
A2 = np.exp(Z2) / np.sum(np.exp(Z2), axis=0)
cost = compute_multiclass_loss(Y_train, A2)

return cost 

第二个模块:

"""
Class that holds a genetic algorithm for evolving a network.
Credit:
A lot of those code was originally inspired by:
http://lethain.com/genetic-algorithms-cool-name-damn-simple/
"""
from functools import reduce
from operator import add
import numpy as np
import random
import logging
from train_Neuroevolution_ameliored import neural_network_evaluator
from train_Neuroevolution_ameliored import get_mnist
class Optimizer():
"""Class that implements genetic algorithm for MLP optimization.
Evolving process.
Cross-over and mutation processes.
Also used as neural network creator class triggered AFTER evolution process, and used for multiple purposes.
Neural network object creator.
Populations creator for both pre and post evolution.
Average fitness of populations.
Cost value compiler.
..."""
def __init__(self, retain=0.5, random_select=0.0, mutation_rate=0.5):
"""Create an optimizer.
Args:
retain (float): Percentage of population to retain after
each generation
random_select (float): Probability of a rejected network
remaining in the population
mutation_rate (float): Probability a network will be
randomly mutated
...
Initialize our network parameters, for network population after first evolution.
"""
self.mutation_rate = mutation_rate
self.random_select = random_select
self.retain = retain
self.accuracy = 0.
self.network = []
self.b1 = 0
self.b1_lines = 64
self.b2 = 0
self.b2_lines = 10
self.input_layer_to_hidden_layer = 0
self.input_layer_to_hidden_layer_shape_lines = 64
self.input_layer_to_hidden_layer_shape_columns = 784
self.hidden_layer_to_output_layer = 0
self.hidden_layer_to_output_layer_shape_lines = 10
self.hidden_layer_to_output_layer_shape_columns = 64


def create_neural_network(self, dataset):
""" Randomly set parameters for a neural network object with a fixed structure.
Structure is: - one input layer with 784 inputs corresponding to each pixel of the MNIST dataset
- one hidden layer with 64 neurons (arbitrary value)
- one output layer with 10 neurons determining each of the 10 digits probability"""
#We can ignore Cifar10 as it is not build for it for the moment
"""if dataset == 'cifar10':
nb_classes, batch_size, input_shape, x_train, 
x_test, y_train, y_test = get_cifar10()"""

if dataset == 'mnist':
X_train = get_mnist() [0]
n_x = X_train.shape[0]
n_h = 64
self.input_layer_to_hidden_layer = np.random.randn(n_h, n_x) #Weights from input to hidden layer
self.input_layer_to_hidden_layer_shape_lines = self.input_layer_to_hidden_layer.shape[0] #Used for weights mutation
self.input_layer_to_hidden_layer_shape_columns = self.input_layer_to_hidden_layer.shape[1] #Used for weights mutation
self.b1 = np.zeros((n_h, 1)) #Biases for hidden layer
self.b1_lines = self.b1.shape[0] #Used for biases mutation
self.hidden_layer_to_output_layer = np.random.randn(10, n_h) #Weights from hidden to output layer
self.hidden_layer_to_output_layer_shape_lines = self.hidden_layer_to_output_layer.shape[0] #Used for weights mutation
self.hidden_layer_to_output_layer_shape_columns = self.hidden_layer_to_output_layer.shape[1] #Used for weights mutation
self.b2 = np.zeros((10, 1)) #Biases for output layer
self.b2_lines = self.b2.shape[0] #Used for biases mutation
self.network = [[self.input_layer_to_hidden_layer], [self.hidden_layer_to_output_layer], [self.b1], [self.b2]] #Network structure with in

def fitness(self, network):
"""Return accuracy, which is our fitness function value after the first evolution."""
return network.accuracy

def create_set(self, network):
"""Set network properties.
Args:
network (list): The network parameters
Used in the mutation process after the first evolution
"""
self.network = network

def grade(self, pop):
"""Find average fitness for a population.
Args:
pop (list): The population of networks
Returns:
(float): The average accuracy of the population
"""
summed = reduce(add, (self.fitness(network) for network in pop))
return summed / float((len(pop)))

def breed(self, mother, father):
"""Make one child as part as their parents.
Args:
mother (list): Optimizer() object parameters
father (lit): Optimizer() object parameters
Returns:
(list): One network object as an Optimizer() object
"""

child = [0,0,0,0]

# Loop through the parameters and pick params for the kid.
child[0] = random.choice([mother.input_layer_to_hidden_layer, father.input_layer_to_hidden_layer])
child[1] = random.choice([mother.hidden_layer_to_output_layer, father.hidden_layer_to_output_layer])
child[2] = random.choice([mother.b1, father.b1])
child[3] = random.choice([mother.b2, father.b2])

#Create a network object and assign child[list] values to it
network = Optimizer()
network.create_set(child)
network.input_layer_to_hidden_layer = child[0]
network.hidden_layer_to_output_layer = child[1]
network.b1 = child[2]
network.b2 = child[3]

#Mutate
if self.mutation_rate > random.random():
network.input_layer_to_hidden_layer = self.mutate(self.input_layer_to_hidden_layer_shape_lines, self.input_layer_to_hidden_layer_shape_columns, network.input_layer_to_hidden_layer)
network.hidden_layer_to_output_layer = self.mutate(self.hidden_layer_to_output_layer_shape_lines, self.hidden_layer_to_output_layer_shape_columns, network.hidden_layer_to_output_layer)
network.b1 = self.mutate_biases(self.b1_lines, network.b1)
network.b2 = self.mutate_biases(self.b2_lines, network.b2)


return network

def mutate(self, shape_array_lines, shape_array_columns, weights_array):
"""Two ways of operating mutation on weights.
Mutate every single weight by multiplying each weight by a random number.
Mutate an arbitrary random number of weights (e.g., from 1 to 100) by multiplying each mutated weight by a random number.
The second technique does not seem to work for an undetermined reason"""

#First technique
mutation_weights = np.random.random((shape_array_lines, shape_array_columns))
return mutation_weights*weights_array


def mutate_biases(self, shape_array_columns, biases_array):
"""Mutate biases using the same technique as the second thechnique used for weights"""

random_number_mutated_biases = np.random.randint(low = 1, high = 100)
list_random_indices_lines = np.random.randint(low = 0, high = shape_array_columns, size = (random_number_mutated_biases))
d = 0
for _ in range(random_number_mutated_biases):
i = np.random.uniform(low=-1, high=+1.1)#random number (with arbitrary values between range)
#used for multiplying mutated weight
#Call a particular bias by calling it by its indices, and modify it by adding i
biases_array[list_random_indices_lines[d]][0] = biases_array[list_random_indices_lines[d]][0] + i
d +=1
return biases_array


def evolve(self, pop):
"""Evolve a population of networks.
Args:
pop (list): A list of network parameters
x: a Differentiatior between first phase before first evolution and second phase after first evolution
Returns:
(list): The evolved population of networks
"""

# Get scores for each network.
graded = [(network.fitness(network), network) for network in pop]


for network in pop:
print("accuracy before =", network.fitness(network))
# Sort on the scores.
graded = [x[1] for x in sorted(graded, key=lambda x: x[0], reverse=False)]

# Get the number we want to keep for the next gen.
retain_length = int(len(graded)*self.retain)
# The parents are every network we want to keep.
parents = graded[:retain_length]

# For those we aren't keeping, randomly keep some anyway.
for individual in graded[retain_length:]:
if self.random_select > random.random():
parents.append(individual)
# Now find out how many spots we have left to fill.
parents_length = len(parents)
desired_length = len(pop) - parents_length
children = []
# Add children, which are bred from two remaining networks.
while len(children) < desired_length:
# Get a random mom and dad.
male = random.randint(0, parents_length-1)
female = random.randint(0, parents_length-1)
# Assuming they aren't the same network...
if male != female:
male = parents[male]
female = parents[female]
# Breed them.
baby = self.breed(male, female)
# Add the children one at a time.
if len(children) < desired_length:
children.append(baby)

parents.extend(children)

total_nbr_values = 0
for i in pop:
for j in parents:
if i == j:
total_nbr_values +=1
print("same value")
print("total =", total_nbr_values)

for network in parents:
print("accuracy after =", network.fitness(network))


return parents


def create_population(self, count, dataset):
"""Create a population of random networks.
Args:
count (int): Number of networks to generate, aka the
size of the population
dataset (string): dataset used for the experiment
Returns:
(list): Population of network objects
"""
pop = []
for _ in range(0, count):
# Create a random network.
network = Optimizer()
network.create_neural_network(dataset)
# Add the network to our population.
pop.append(network)

return pop
def evaluate_neural_network(self):
""" Get result of the chosen fitness function as an Optimizer() object.
Accuracy is just a name and does not necessarily mean the actual accuracy."""
self.accuracy = neural_network_evaluator(self.input_layer_to_hidden_layer,
self.hidden_layer_to_output_layer, self.b1, self.b2)
print(self.accuracy) #Display network cost value.

def print_network(self):
"""Print out a network and its cost value in the 'log.txt' file."""
logging.info(self.network)
logging.info("Network accuracy: %.2f%%" % (self.accuracy)) 

第三个模块:

"""Entry point to evolving the neural network. Start here."""
import logging
from optimizer_Neuroevolution_ameliored import Optimizer
from tqdm import tqdm
# Setup logging.
logging.basicConfig(
format='%(asctime)s - %(levelname)s - %(message)s',
datefmt='%m/%d/%Y %I:%M:%S %p',
level=logging.DEBUG,
filename='log.txt'
)

def train_networks(networks):
"""Train each network.
Args:
networks (list): Current population of networks
"""
pbar = tqdm(total=len(networks))
for network in networks:
network.evaluate_neural_network()
pbar.update(1)
pbar.close()

def get_average_accuracy(networks):
"""Get the average cost value for a group of networks.
Args:
networks (list): List of networks
Returns:
float: The average cost value of a population of networks.
"""
total_accuracy = 0
for network in networks:
total_accuracy += network.accuracy
return total_accuracy / len(networks)

def generate(generations, population, dataset):
"""Generate a network with the genetic algorithm.
Args:
generations (int): Number of times to evole the population
population (int): Number of networks in each generation
dataset (str): Dataset to use for training/evaluating
"""
#Create an initial population of random networks
optimizer = Optimizer()
networks = optimizer.create_population(population, dataset)
#Train them
train_networks(networks)
#Print out generation number
logging.info("***Doing generation %d of %d***" %
(1, generations))
print("generation", 1)
# Get the average cost value for this generation.
average_accuracy = get_average_accuracy(networks)
# Print out the average cost value for this generation.
logging.info("Generation average: %.2f%%" % (average_accuracy))
logging.info('-'*80)
# Evolve the first generation.
networks = optimizer.evolve(networks)
train_networks(networks)
#Print out generation number
logging.info("***Doing generation %d of %d***" %
(2, generations))

# Get the average cost value for this generation.
average_accuracy = get_average_accuracy(networks)
# Print out the average cost value for this generation.
logging.info("Generation average: %.2f%%" % (average_accuracy))
logging.info('-'*80)

# Evolve, except on the last iteration.
for i in range(generations-2):
print("generation", i+2)
print("Before evolving process")
print("values of weights from input layer to hidden layer are:")
print(networks[0].input_layer_to_hidden_layer[0][0])
print(networks[0].input_layer_to_hidden_layer[0][2])
print(networks[0].input_layer_to_hidden_layer[5][2])
print(networks[0].input_layer_to_hidden_layer[5][8])
print("values of weights from hidden layer to output layer are:")
print(networks[0].hidden_layer_to_output_layer[0][0])
print(networks[0].hidden_layer_to_output_layer[3][6])
print(networks[0].hidden_layer_to_output_layer[1][56])
print(networks[0].hidden_layer_to_output_layer[7][23])
networks = optimizer.evolve(networks)
#networks = optimizer.create_population(population, networks, x)
print("After evolving process")
print("values of weights from input layer to hidden layer are:")
print(networks[0].input_layer_to_hidden_layer[0][0])
print(networks[0].input_layer_to_hidden_layer[0][2])
print(networks[0].input_layer_to_hidden_layer[5][2])
print(networks[0].input_layer_to_hidden_layer[5][8])
print("values of weights from hidden layer to output layer are:")
print(networks[0].hidden_layer_to_output_layer[0][0])
print(networks[0].hidden_layer_to_output_layer[3][6])
print(networks[0].hidden_layer_to_output_layer[1][56])
print(networks[0].hidden_layer_to_output_layer[7][23])


train_networks(networks)
print("Novelty accuracy is:")
for network in networks:
network.evaluate_neural_network()
print("Novelty Novelty accuracy is:")
for network in networks:
network.evaluate_neural_network()
#Print out generation number
logging.info("***Doing generation %d of %d***" %
(i + 3, generations))
# Get the average cost value for generations starting from third generation
average_accuracy_pop = get_average_accuracy(networks)
# Print out the average cost value for generations starting from third generation
logging.info("Generation average: %.2f%%" % (average_accuracy))
logging.info('-'*80)
# Sort our final population of networks aka the last generation.
networks = sorted(networks, key=lambda x: x.accuracy, reverse=False)
# Print out the top 5 networks of the last generation.
print_networks(networks[:5])

def print_networks(networks):
"""Print a list of networks.
Args:
networks (list): The population of networks
"""
logging.info('-'*80)
for network in networks:
network.print_network()

def main():
"""Evolve a network."""
generations = 30  # Number of times to evole the population.
population = 10  # Number of networks in each generation.
dataset = 'mnist' # Dataset
# Print out the number of generations and number of individuals chosen
logging.info("***Evolving %d generations with population %d***" %
(generations, population))
generate(generations, population, dataset)
if __name__ == '__main__':
main() 

因此,我预计相同网络对象的损失值输出在一代到另一代之间是相同的,但是当通过 neural_network_evaluator() 函数重新计算损失值时,它显示完全不同的值。

实际上,从一代到另一代人口的十个网络的损失值应该只会减少或保持不变,但它们正在增加,这是我不理解的。

谢谢你的帮助。

每次从第一个模块调用 neural_network_evaluator() 时,它都会使用从 get_mnist() 生成的新训练集,该训练集是随机绘制的。

#Shuffle train set to randomize it as it is organized from digits 0 to 9
shuffle_index = np.random.permutation(m)
X_train, Y_train = X_train[:, shuffle_index], Y_train[:, shuffle_index]

如果希望每次调用评估时都返回相同的结果,则需要针对相同的训练集对其进行评估,而不是绘制新的训练集。

(免责声明:我还没有听说过在神经网络上使用进化算法,所以我根本无法评论它的效果如何,或者如何为它选择训练集。

最新更新