Keras SGD 优化器方法中当前批次"get_updates"正向传递计算



我正在尝试在 Keras SGD 优化器的get_gradient方法中实现随机 armijo 规则。 因此,我需要计算另一个正向传递来检查所选learning_rate是否良好。我不想要对梯度进行另一次计算,但我想使用更新的权重。

使用 Keras 版本 2.3.1 和 Tensorflow 版本 1.14.0

def get_updates(self, loss, params):
grads = self.get_gradients(loss, params)
self.updates = [K.update_add(self.iterations, 1)]
lr = self.learning_rate
if self.initial_decay > 0:
lr = lr * (1. / (1. + self.decay * K.cast(self.iterations,
K.dtype(self.decay))))
# momentum
shapes = [K.int_shape(p) for p in params]
moments = [K.zeros(shape, name='moment_' + str(i))
for (i, shape) in enumerate(shapes)]
self.weights = [self.iterations] + moments
for p, g, m in zip(params, grads, moments):
v = self.momentum * m - lr * g  # velocity
self.updates.append(K.update(m, v))
if self.nesterov:
new_p = p + self.momentum * v - lr * g
else:
new_p = p + v
# Apply constraints.
if getattr(p, 'constraint', None) is not None:
new_p = p.constraint(new_p)
self.updates.append(K.update(p, new_p))
### own changes ###
if self.armijo:
inputs = (model._feed_inputs +
model._feed_targets +
model._feed_sample_weights)
input_layer = model.layers[0].input
armijo_function = K.function(inputs=input_layer, outputs=[loss],                                                                                  
updates=self.updates,name='armijo')
loss_next= armijo_function(inputs)
[....change updates if learning rate was not good enough...]
return self.updates

不幸的是,我在尝试计算"loss_next"时不明白错误消息:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Requested Tensor connection between nodes "conv2d_1_input" and "conv2d_1_input" would create a cycle.

这里有两个问题:

  • 如何访问我正在处理的当前批次?前向计算应仅考虑实际批次,并且梯度也仅属于该批次。

  • 不使用 K.function 更新和评估前向传递以计算该批次的损失函数有什么更好的主意吗?

有人可以帮忙吗?提前谢谢。

如何访问我正在处理的当前批次?前向计算应仅考虑实际批次,并且梯度也仅属于该批次。

为此,您可以在model.fit()中使用batch_size = Total training records,以便每个纪元只有一个正向传递和反向传播。因此,您可以分析epoch 1的梯度并修改epoch 2的学习率,或者如果您使用的是自定义训练循环,则相应地修改代码。

不使用 K.function 更新和评估前向传递以计算该批次的损失函数有什么更好的主意吗?

我不记得除了在tensorflow version 1.x中使用from tensorflow.keras import backend as K之外,任何其他评估梯度的选项。最好的选择是将张量流更新到最新版本2.2.0并使用tf.GradientTape

建议通过这个答案来使用from tensorflow.keras import backend as K捕获梯度tensorflow 1.x.

下面是一个与您的要求几乎相似的示例代码。我正在使用tensorflow version 2.2.0.您可以从此程序构建您的需求。

我们在程序中执行以下功能-

  1. 我们在每个时期之后都会改变学习率。您可以使用model.fit的回调参数来做到这一点。在这里,我使用tf.keras.callbacks.LearningRateScheduler将每个纪元的学习率增加 0.01,并在每个纪元结束时使用tf.keras.callbacks.Callback显示它。
  2. 在每个纪元结束后使用tf.GradientTape()计算梯度。我们正在使用附加将每个时代的毕业生收集到列表中。
  3. 还可以根据您的要求设置batch_size=len(train_images)

注意:由于内存限制,我只训练了 Cifar 数据集中的 500 条记录。

法典-

%tensorflow_version 2.x
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import backend as K
import os
import numpy as np
import matplotlib.pyplot as plt
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()
train_images = train_images[:500]
train_labels = train_labels[:500]
test_images = test_images[:50]
test_labels = test_labels[:50]
model = Sequential([
Conv2D(16, 3, padding='same', activation='relu', input_shape=(32, 32, 3)),
MaxPooling2D(),
Conv2D(32, 3, padding='same', activation='relu'),
MaxPooling2D(),
Conv2D(64, 3, padding='same', activation='relu'),
MaxPooling2D(),
Flatten(),
Dense(512, activation='relu'),
Dense(10)
])
lr = 0.01
adam = Adam(lr)
# Define the Gradient Fucntion
epoch_gradient = []
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# Define the Required Callback Function
class GradientCalcCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
with tf.GradientTape() as tape:
logits = model(train_images, training=True)
loss = loss_fn(train_labels, logits)    
grad = tape.gradient(loss, model.trainable_weights)
model.optimizer.apply_gradients(zip(grad, model.trainable_variables))
epoch_gradient.append(grad)
gradcalc = GradientCalcCallback()
# Define the Required Callback Function
class printlearningrate(tf.keras.callbacks.Callback):
def on_epoch_begin(self, epoch, logs={}):
optimizer = self.model.optimizer
lr = K.eval(optimizer.lr)
Epoch_count = epoch + 1
print('n', "Epoch:", Epoch_count, ', LR: {:.2f}'.format(lr))
printlr = printlearningrate() 
def scheduler(epoch):
optimizer = model.optimizer
return K.eval(optimizer.lr + 0.01)
updatelr = tf.keras.callbacks.LearningRateScheduler(scheduler)
model.compile(optimizer=adam, 
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
epochs = 10 
history = model.fit(train_images, train_labels, epochs=epochs, batch_size=len(train_images), 
validation_data=(test_images, test_labels),
callbacks = [printlr,updatelr,gradcalc])
# (7) Convert to a 2 dimensiaonal array of (epoch, gradients) type
gradient = np.asarray(epoch_gradient)
print("Total number of epochs run:", epochs)
print("Gradient Array has the shape:",gradient.shape)

输出-

Epoch: 1 , LR: 0.01
Epoch 1/10
1/1 [==============================] - 0s 427ms/step - loss: 30.1399 - accuracy: 0.0820 - val_loss: 2114.8201 - val_accuracy: 0.1800 - lr: 0.0200
Epoch: 2 , LR: 0.02
Epoch 2/10
1/1 [==============================] - 0s 329ms/step - loss: 141.6176 - accuracy: 0.0920 - val_loss: 41.7008 - val_accuracy: 0.0400 - lr: 0.0300
Epoch: 3 , LR: 0.03
Epoch 3/10
1/1 [==============================] - 0s 328ms/step - loss: 4.1428 - accuracy: 0.1160 - val_loss: 2.3883 - val_accuracy: 0.1800 - lr: 0.0400
Epoch: 4 , LR: 0.04
Epoch 4/10
1/1 [==============================] - 0s 329ms/step - loss: 2.3545 - accuracy: 0.1060 - val_loss: 2.3471 - val_accuracy: 0.1800 - lr: 0.0500
Epoch: 5 , LR: 0.05
Epoch 5/10
1/1 [==============================] - 0s 340ms/step - loss: 2.3208 - accuracy: 0.1060 - val_loss: 2.3047 - val_accuracy: 0.1800 - lr: 0.0600
Epoch: 6 , LR: 0.06
Epoch 6/10
1/1 [==============================] - 0s 331ms/step - loss: 2.3048 - accuracy: 0.1300 - val_loss: 2.3069 - val_accuracy: 0.0600 - lr: 0.0700
Epoch: 7 , LR: 0.07
Epoch 7/10
1/1 [==============================] - 0s 337ms/step - loss: 2.3041 - accuracy: 0.1340 - val_loss: 2.3432 - val_accuracy: 0.0600 - lr: 0.0800
Epoch: 8 , LR: 0.08
Epoch 8/10
1/1 [==============================] - 0s 341ms/step - loss: 2.2871 - accuracy: 0.1400 - val_loss: 2.6009 - val_accuracy: 0.0800 - lr: 0.0900
Epoch: 9 , LR: 0.09
Epoch 9/10
1/1 [==============================] - 1s 515ms/step - loss: 2.2810 - accuracy: 0.1440 - val_loss: 2.8530 - val_accuracy: 0.0600 - lr: 0.1000
Epoch: 10 , LR: 0.10
Epoch 10/10
1/1 [==============================] - 0s 343ms/step - loss: 2.2954 - accuracy: 0.1300 - val_loss: 2.3049 - val_accuracy: 0.0600 - lr: 0.1100
Total number of epochs run: 10
Gradient Array has the shape: (10, 10)

希望这能回答你的问题。快乐学习。

相关内容

  • 没有找到相关文章

最新更新