我的CNN模型仍然过拟合吗?如果是这样,我该如何应对?我的架构有问题吗



我的CNN模型在训练过程中一直保持高精度/低损失,在验证过程中保持低精度/高损失,因此我开始怀疑它过于拟合。

因此,我介绍了一些缺失层以及一些图像增强。我还尝试过在每个epoch之后使用ReduceLROnPlateau和EarlyStoping来监控val_loss。

尽管这些措施有助于提高验证的准确性,但我仍然离预期结果相去甚远,而且我真的没有什么想法了。这是我现在得到的结果:

Epoch 9/30
999/1000 [============================>.] - ETA: 0s - loss: 0.0072 - accuracy: 0.9980
Epoch 9: ReduceLROnPlateau reducing learning rate to 1.500000071246177e-05.
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0072 - accuracy: 0.9980 - val_loss: 2.2994 - val_accuracy: 0.6570 - lr: 1.5000e-04
Epoch 10/30
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0045 - accuracy: 0.9985 - val_loss: 2.2451 - val_accuracy: 0.6560 - lr: 1.5000e-05
Epoch 11/30
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0026 - accuracy: 0.9995 - val_loss: 2.6080 - val_accuracy: 0.6540 - lr: 1.5000e-05
Epoch 12/30
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0018 - accuracy: 1.0000 - val_loss: 2.8192 - val_accuracy: 0.6560 - lr: 1.5000e-05
Epoch 13/30
1000/1000 [==============================] - 19s 19ms/step - loss: 0.0013 - accuracy: 1.0000 - val_loss: 2.8216 - val_accuracy: 0.6570 - lr: 1.5000e-05
32/32 [==============================] - 1s 23ms/step - loss: 2.8216 - accuracy: 0.6570

我错误地认为过拟合仍然是阻碍我的模型在验证和测试数据上得分高的问题吗?

或者我的体系结构有什么根本性的问题吗?

#prevent overfitting, generalize better
data_augmentation = tf.keras.Sequential([
layers.RandomFlip("horizontal_and_vertical"),
layers.RandomRotation(0.2),
layers.RandomZoom((0.2))
])
model = tf.keras.models.Sequential()
model.add(data_augmentation)
#same padding, since edges of the pictures often contain valuable information
model.add(layers.Conv2D(64, (3,3), strides=(1,1), padding='same', activation = 'relu', input_shape=(64,64,3)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Dropout(0.25))
model.add(layers.Conv2D(32, (3,3), strides=(1,1), padding='same', activation = 'relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Dropout(0.25))
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
#prevent overfitting
model.add(layers.Dropout(0.25))
#4 output classes, softmax since we want to end up with probabilities for each class at the end (have to sum up to 1)
model.add(layers.Dense((4), activation='softmax'))
#not using one hot encoding, therefore sparse categorical entropy
model.compile(loss='sparse_categorical_crossentropy', optimizer=keras.optimizers.Adam(learning_rate=0.00015), metrics='accuracy')

尝试使用下面的代码,我会在平坦层之后添加一个BatchNormalization层

model.add(layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001 )

对于致密层,添加正则化子

model.add(layers.Dense(128, kernel_regularizer = regularizers.l2(l = 0.016),activity_regularizer=regularizers.l1(0.006),
bias_regularizer=regularizers.l1(0.006) ,activation='relu')

此外,我建议您使用Keras回调ReduceLROnPlateau使用可调整的学习率。文档在这里。我推荐的代码如下所示

rlronp=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.4,
patience=2, verbose=1, mode="auto")

我还建议我们使用Keras回调EarlyStoping。这里有相关文档。我推荐的代码低于

estop=tf.keras.callbacks.EarlyStopping( monitor="val_loss", patience=4,
verbose=1,mode="auto",    
restore_best_weights=True)

在你安装模型之前,包括下面的代码

callbacks=[rlronp, estop]

在型号中。它包括回调=回调

您可以尝试将regularizer添加到所有或部分层,例如:

model.add(layers.Conv2D(32, (3,3), strides=(1,1), kernel_regularizer='l1_l2', padding='same', activation = 'relu'))

您可以尝试在conv层之间用SpatialDropout2D替换Dropout。你也可以尝试更多的图像增强,可能是高斯噪声、RandomContrastRandomBrightness

由于你有很高的训练精度,你也可以尝试简化你的模型(例如,减少单位(。

最新更新