使用 ModelCeckpoint 保存检查点后,Keras 停止了训练过程



我正在用tf.keras训练CNN。在保存检查站之后,Keras没有开始下一个时代

注意:1(作为保护程序被使用 tf.keras.callbacks.ModelCeckpoint2(用于培训fit_generator((

def iterate_minibatches(inputs, targets, batchsize):
    assert len(inputs) == len(targets)
    indices = np.arange(len(inputs))
    np.random.shuffle(indices)
    for start_idx in np.arange(0, len(inputs) - batchsize + 1, batchsize):
        excerpt = indices[start_idx:start_idx + batchsize]
        yield load_images(inputs[excerpt], targets[excerpt])
#Model path
model_path = "C:/Users/Paperspace/Desktop/checkpoints/cp.ckpt"
#saver = tf.train.Saver(max_to_keep=3)
cp_callback = tf.keras.callbacks.ModelCheckpoint(model_path, 
                                                 verbose=1,
                                                 save_weights_only=True,
                                                period=2)
tb_callback =TensorBoard(log_dir="./Graph/{}".format(time()))
batch_size = 750
history = model.fit_generator(generator=iterate_minibatches(X_train, Y_train,batch_size),
                                  validation_data=iterate_minibatches(X_test, Y_test, batch_size),
                                  # validation_data=None,
                                  steps_per_epoch=len(X_train)//batch_size,
                                  validation_steps=len(X_test)//batch_size,
                                  verbose=1,
                                  epochs=30,
                                  callbacks=[cp_callback,tb_callback] 
                             )

实际结果它停止训练没有任何问题。预期结果将进入下一个纪元。

**Log**
Epoch 1/30
53/53 [==============================] - 919s 17s/step - loss: 1.2445 - acc: 0.0718
426/426 [==============================] - 7058s 17s/step - loss: 1.7877 - acc: 0.0687 - val_loss: 1.2445 - val_acc: 0.0718
Epoch 2/30
WARNING:tensorflow:Your dataset iterator ran out of data.
Epoch 00002: saving model to C:/Users/Paperspace/Desktop/checkpoints/cp.ckpt
WARNING:tensorflow:This model was compiled with a Keras optimizer (<tensorflow.python.keras.optimizers.Adam object at 0x0000023A913DE470>) but is being saved in TensorFlow format with `save_weights`. The model's weights will be saved, but unlike with TensorFlow optimizers in the TensorFlow format the optimizer's state will not be saved.
Consider using a TensorFlow optimizer from `tf.train`.
WARNING:tensorflow:From C:UsersPaperspaceAnaconda3libsite-packagestensorflowpythonkerasenginenetwork.py:1436: update_checkpoint_state (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.train.CheckpointManager to manage checkpoints rather than manually editing the Checkpoint proto.
  0/426 [..............................] - ETA: 0s - loss: 0.0000e+00 - acc: 0.0687 - val_loss: 0.0000e+00 - val_acc: 0.0000e+00

乍一看,您的生成器看起来不正确。Keras 生成器需要其中的while True:循环。也许这对你有用

def iterate_minibatches(inputs, targets, batchsize):
    assert len(inputs) == len(targets)
    indices = np.arange(len(inputs))
    np.random.shuffle(indices)
    while True:
        start = 0
        end = batchsize
        while start < len(inputs):
            excerpt = indices[start:end]
            yield load_images(inputs[excerpt], targets[excerpt])
            start += batchsize
            end += batchsize

Keras 生成器必须在无限循环中生成批处理。此更改应该有效,否则您可以按照这样的教程进行操作。

def iterate_minibatches(inputs, targets, batchsize):
    assert len(inputs) == len(targets)
    while True:
        indices = np.arange(len(inputs))
        np.random.shuffle(indices)
        for start_idx in np.arange(0, len(inputs) - batchsize + 1, batchsize):
            excerpt = indices[start_idx:start_idx + batchsize]
            yield load_images(inputs[excerpt], targets[excerpt])

最新更新