我正在尝试训练,保存和加载一个tensorflow模型。我的代码大纲如下:
devices = ["/device:GPU:{}".format(i) for i in range(num_gpus)]
strategy = tf.distribute.MirroredStrategy(devices)
with strategy.scope():
# Create model
model = my_model(some parameters)
model.build((parameters))
model.summary()
# Adam optimizer with EMA smoothing
opt = tf.keras.optimizers.Adam(learning_rate)
opt = tfa.optimizers.MovingAverage(opt, average_decay=ema_decay)
model.compile(
optimizer=opt,
loss=loss_dict,
loss_weights=loss_weights,
metrics=metrics_dict)
#adding softmax layer
output_list = list()
for i in range(num_class):
output_list.append(tf.keras.layers.Softmax(name=f"name_{str(i)}")(model.output[i]))
output_list.append(model.output[num_class])
model_b = tf.keras.Model(inputs=model.input, outputs=output_list)
model_b.build((None, None, feats_dim, 1))
model_b.compile(optimizer=opt)
model.fit(parameters... callbacks=[cp_callback, logger_callback, tb_callback])
model_b.load_weights(checkpoint_path)
model_b.save(os.path.join(model_path, "model.h5"))
检查点保存使用:
cp_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path, verbose=1, save_weights_only=True, period=1
)
当到达加载权重的行时,我收到以下错误:
ValueError: Trying to create optimizer slot variable under the scope for
tf.distribute.Strategy
(<tensorflow.python.distribute.distribute_lib._DefaultDistributionStrategy
object at 0x7fb6047c3c50>), which is different from the scope used for the original
variable (MirroredVariable:{
如有任何帮助,不胜感激
根据这个保存模型的api创建变量(应该是分布式变量)。尝试在策略范围内创建/调用回调。