Keras L2正则化使网络无法学习



我正在尝试为MNIST数据集训练一个简单的模型。由36个神经元组成的单个隐藏层。

NUM_CLASSES = 10
BATCH_SIZE = 128
EPOCHS = 100
model = models.Sequential([
layers.Input(shape = x_train.shape[1:]),
layers.Dense(units = 36, activation = activations.sigmoid, kernel_regularizer = regularizers.l2(0.0001)),
layers.Dropout(0.5),
layers.Dense(units = NUM_CLASSES, activation = activations.softmax)
])
model.summary()
model.compile(loss      = losses.CategoricalCrossentropy(),
optimizer = optimizers.RMSprop(),
metrics   = ['accuracy'])
history = model.fit(x_train, y_train,
batch_size = BATCH_SIZE,
epochs = EPOCHS,
verbose = 2,
validation_data = (x_val, y_val))

如果没有l2部分,一切都可以,但一旦我尝试使用正则化,一切都会发生变化,精度始终保持在10%:

Epoch 1/300
391/391 - 1s - loss: 2.4411 - accuracy: 0.0990 - val_loss: 2.3027 - val_accuracy: 0.1064
Epoch 2/300
391/391 - 0s - loss: 2.3374 - accuracy: 0.1007 - val_loss: 2.3031 - val_accuracy: 0.1064
Epoch 3/300
391/391 - 0s - loss: 2.3178 - accuracy: 0.1016 - val_loss: 2.3041 - val_accuracy: 0.1064
Epoch 4/300
391/391 - 0s - loss: 2.3089 - accuracy: 0.1045 - val_loss: 2.3026 - val_accuracy: 0.1064
Epoch 5/300
391/391 - 0s - loss: 2.3051 - accuracy: 0.1060 - val_loss: 2.3030 - val_accuracy: 0.1064

当我手动将regularizers.l2作为参数时和当我将"l2"作为参数时都会发生这种情况。

为什么会发生这种事?我做错了什么?

我怀疑,对于.5的高辍学率,添加正则化会阻止网络学习。丢弃和正则化都是防止过度拟合的一种手段。尝试使用较低的辍学率和正则化,看看网络是否正确训练。到目前为止,我的经验是,辍学者比正规者更能有效地控制过度拟合。

最新更新