如果损失值增加了Keras,则将权重重置为最后一个epoch



我正在Keras中研究我的ANN,它与不平衡的二进制分类数据集一起使用,我刚刚设置了一个自定义学习率,在每个epoch开始时检查损失值与上一个epoch的比较。如果它更小,我就提高学习率如果不是,我就降低学习率我想将权重重置为与上一个epoch相同,我该怎么做呢?

我找到了类似

的东西
model.layers[0].get_weights() 

这个能给我权重吗?如果满足这个条件,我如何将它们保存到我的回调并设置它们?

class CustomLearningRateScheduler(keras.callbacks.Callback):
def __init__(self):
super(CustomLearningRateScheduler, self).__init__()
self.lastVal = 0
self.learning_rate = 10
self.last_iteration_weights = []
def on_train_begin(self, logs={}):
self.errors = []
def on_epoch_start(self, epoch):
self.weights = self.model.layers[0].get_weights()
def on_epoch_end(self, epoch, logs={}):
if not hasattr(self.model.optimizer, "lr"):
raise ValueError('Optimizer must have a "lr" attribute.')
# Get the current learning rate from model's optimizer.
lr = float(tf.keras.backend.get_value(self.model.optimizer.learning_rate))

val = logs.get('loss')
if(float(val) > float(self.lastVal)):
self.learning_rate = lr * 0.95
tf.keras.backend.set_value(self.model.optimizer.lr, self.learning_rate)

else:
self.learning_rate = lr * 1.01
tf.keras.backend.set_value(self.model.optimizer.lr, self.learning_rate)
self.lastVal = val
self.errors.append(self.lastVal)
print("nEpoch %05d: Learning rate is %f ." % (epoch, self.learning_rate))

这个类被调用:

model_p.fit(X, y, epochs=EPOCH_SIZE, batch_size=BATCH_SIZE, verbose=1, shuffle=True, callbacks=[CustomLearningRateScheduler()])

我编写了一个自定义回调Dwell,它实现了您希望完成的任务,并在大量的图像分类任务中使用了这个回调。

此回调使您能够选择继续或停止培训的选项。在训练ask_epoch epoch number of epochs之后,回调查询用户输入H以停止训练或输入整数N。如果输入整数,则训练将继续进行N多个epoch,则再次查询用户。它还允许您设置一个名为dwell的参数。如果dwell设置为True,则回调将监视验证丢失。如果在一个epoch结束时,验证损失大于前一个epoch的验证损失,则模型的权重被重置为前一个epoch的权重,并且学习率通过next_lr=current_lr * factor降低,其中factor是用户指定的小于1.0的浮点值。其思想是,如果验证损失增加,则模型已经移动到N空间中的位置(N是可训练权重的数量),该位置比前一个epoch在N空间中的位置更不利。所以为什么要去那里呢?相反,恢复前一个epoch的权重,然后降低学习率。回调的形式是DWELL(model, factor, DWELL, verbose, ask_epoch),其中:

model是编译模型的名称Factor是介于0.0和1.0之间的浮点数。如果验证损失增加,则下一个epoch的学习率由next_lr = current_lr * factor决定。Dwell是一个布尔值。如果设置为True,则监视验证丢失。如果它增加,则将模型权重设置为前一个epoch的权重,并且学习率降低。Verbose是一个布尔值。如果为True,则回调在验证损失增加的epoch结束时打印新的lrAsk_epoch是一个整数。在训练开始时,将对ask_epoch的epoch数进行训练。此时,用户被要求输入H以停止训练,或者输入一个整数N,其中N指定要运行的epoch比再次查询的epoch多多少

class DWELL(keras.callbacks.Callback):
def __init__(self,model,  factor,dwell, verbose,ask_epoch):
super(DWELL, self).__init__()
self.model=model
self.initial_lr=float(tf.keras.backend.get_value(model.optimizer.lr)) # get the initiallearning rate and save it  
self.lowest_vloss=np.inf # set lowest validation loss to infinity initially
self.best_weights=self.model.get_weights() # set best weights to model's initial weights 
self.verbose=verbose 
self.best_epoch=0
self.ask_epoch=ask_epoch
self.ask=True
self.dwell=dwell

def on_train_begin(self, logs=None): # this runs on the beginning of training
print('Training will proceed until epoch', ask_epoch,' then you will be asked to') 
print('enter H to halt training or enter an integer for how many more epochs to run then be asked again')  
self.start_time= time.time() # set the time at which training started


def on_epoch_end(self, epoch, logs=None):  # method runs on the end of each epoch
if self.ask: # are the conditions right to query the user?
if epoch + 1 ==self.ask_epoch: # is this epoch the one for querying the user?
print('n Enter H to end training or  an integer for the number of additional epochs to run then ask again')
ans=input()

if ans == 'H' or ans =='h' or ans == '0': # quit training for these conditions
print ('you entered ', ans, ' Training halted on epoch ', epoch+1, ' due to user inputn', flush=True)
self.model.stop_training = True # halt training
else: # user wants to continue training
self.ask_epoch += int(ans)
print ('you entered ', ans, ' Training will continue to epoch ', self.ask_epoch, flush=True)
if self.dwell:
lr=float(tf.keras.backend.get_value(self.model.optimizer.lr)) # get the current learning rate         
vloss=logs.get('val_loss')  # get the validation loss for this epoch 
if vloss>self.lowest_vloss:
self.model.set_weights(self.best_weights)
new_lr=lr * factor
tf.keras.backend.set_value(self.model.optimizer.lr, new_lr)
if self.verbose:
print( 'n model weights reset to best weights from epoch ', self.best_epoch+1, ' and reduced lr to ', new_lr, flush=True)
else:
self.lowest_vloss=vloss
self.best_weights=self.model.get_weights()
self.best_epoch= epoch

下面是使用

的一个例子
# model is the variable name of your compiled model
ask_epoch=5 # query user at end of epoch 5 to halt or continue training
factor=.5 # if validation loss increased next_lt = current_lr * factor
dwell=True
verbose=True  # print out new lr if validation loss increased
dwell=DWELL(model,factor, verbose,  ask_epoch)
callbacks=[ DWELL(model,factor,dwell, verbose,  ask_epoch)]

模型。Fit set回调=回调。下面是一个训练输出的例子,我故意设置了一个大的初始学习率(.02)来引起DWELL回调在训练的早期降低学习率。

Training will proceed until epoch 5  then you will be asked to
enter H to halt training or enter an integer for how many more epochs to run then be asked again
Epoch 1/40
313/313 [==============================] - 62s 153ms/step - loss: 6.2284 - accuracy: 0.1456 - val_loss: 2.9476 - val_accuracy: 0.2458
Epoch 2/40
313/313 [==============================] - 44s 141ms/step - loss: 2.1466 - accuracy: 0.2686 - val_loss: 8.4516 - val_accuracy: 0.3502
model weights reset to best weights from epoch  1  and reduced lr to  0.009999999776482582
Epoch 3/40
313/313 [==============================] - 46s 146ms/step - loss: 2.0746 - accuracy: 0.2628 - val_loss: 1.7664 - val_accuracy: 0.4072
Epoch 4/40
313/313 [==============================] - 45s 144ms/step - loss: 1.8257 - accuracy: 0.3944 - val_loss: 1.3599 - val_accuracy: 0.6120
Epoch 5/40
313/313 [==============================] - 45s 144ms/step - loss: 1.5230 - accuracy: 0.5530 - val_loss: 1.0913 - val_accuracy: 0.6901
Enter H to end training or  an integer for the number of additional epochs to run then ask again
2
you entered  2  Training will continue to epoch  7
Epoch 6/40
313/313 [==============================] - 44s 141ms/step - loss: 1.2793 - accuracy: 0.6745 - val_loss: 0.8224 - val_accuracy: 0.8284
Epoch 7/40
313/313 [==============================] - 45s 142ms/step - loss: 1.0747 - accuracy: 0.7442 - val_loss: 0.7990 - val_accuracy: 0.8271
Enter H to end training or  an integer for the number of additional epochs to run then ask again
4
you entered  4  Training will continue to epoch  11
Epoch 8/40
313/313 [==============================] - 45s 144ms/step - loss: 0.9850 - accuracy: 0.7770 - val_loss: 1.5557 - val_accuracy: 0.8688
model weights reset to best weights from epoch  7  and reduced lr to  0.004999999888241291
Epoch 9/40
313/313 [==============================] - 45s 143ms/step - loss: 0.8708 - accuracy: 0.7911 - val_loss: 0.5515 - val_accuracy: 0.8643
Epoch 10/40
313/313 [==============================] - 45s 144ms/step - loss: 0.8346 - accuracy: 0.8047 - val_loss: 0.4961 - val_accuracy: 0.9129
Epoch 11/40
313/313 [==============================] - 45s 144ms/step - loss: 0.7811 - accuracy: 0.8364 - val_loss: 0.5186 - val_accuracy: 0.9526
Enter H to end training or  an integer for the number of additional epochs to run then ask again
h
you entered  h  Training halted on epoch  11  due to user input

我在同一数据集上做了许多测试,dwell设置为True, dwell设置为False。由于固有的张流随机性,很难判断,但当dwell=True时,模型似乎收敛得更快一些。到目前为止,我还没有遇到与dwell=True收敛到局部最小值的问题,我实现了与dwell= False相同或更好的验证损失

最新更新