Keras 模型学习给出错误答案的训练准确率提高到 0.8,然后急剧下降到 0.1



我正在阅读François Chollet的《Deep Learning with Python》一书。

在第 7.9 节中,有一个使用 Conv1D 层处理 IMDB 数据集的示例神经网络。令我惊讶的是,它开始学习,训练和验证准确性都有所提高,但是在几个时代之后,训练和验证准确性都下降了。

看到验证精度下降并不让我感到惊讶,这是非常典型的过度拟合,但我不明白的是训练准确率如何下降到低至 12%。这几乎就像网络正在学习与它应该学习的相反的东西。

代码为:

import keras
from keras import layers
from keras.datasets import imdb
from keras.preprocessing import sequence
from keras.optimizers import Adam, RMSprop
max_features = 2000
max_len = 500
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
x_train = sequence.pad_sequences(x_train, maxlen=max_len)
x_test = sequence.pad_sequences(x_test, maxlen=max_len)
model = keras.models.Sequential()
model.add(layers.Embedding(max_features, 128,
input_length=max_len,
name='embed'))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.MaxPooling1D(5))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(1))
model.summary()
model.compile(optimizer=RMSprop(),
loss='binary_crossentropy',
metrics=['acc'])
history = model.fit(x_train, y_train,
epochs=20,
batch_size=128,
validation_split=0.2,
# callbacks=callbacks,
)

训练结果如下:

Train on 20000 samples, validate on 5000 samples
Epoch 1/20
20000/20000 [==============================] - 4s 212us/step - loss: 0.7043 - acc: 0.6076 - val_loss: 0.4488 - val_acc: 0.8166
Epoch 2/20
20000/20000 [==============================] - 3s 151us/step - loss: 0.4509 - acc: 0.8179 - val_loss: 0.6575 - val_acc: 0.7594
Epoch 3/20
20000/20000 [==============================] - 3s 151us/step - loss: 0.4082 - acc: 0.7923 - val_loss: 0.4759 - val_acc: 0.7874
Epoch 4/20
20000/20000 [==============================] - 3s 152us/step - loss: 0.3633 - acc: 0.7526 - val_loss: 0.5385 - val_acc: 0.7356
Epoch 5/20
20000/20000 [==============================] - 3s 154us/step - loss: 0.3333 - acc: 0.7235 - val_loss: 0.5658 - val_acc: 0.7056
Epoch 6/20
20000/20000 [==============================] - 3s 152us/step - loss: 0.2793 - acc: 0.6868 - val_loss: 0.5790 - val_acc: 0.6494
Epoch 7/20
20000/20000 [==============================] - 3s 151us/step - loss: 0.2433 - acc: 0.6408 - val_loss: 0.6710 - val_acc: 0.5726
Epoch 8/20
20000/20000 [==============================] - 3s 149us/step - loss: 0.2061 - acc: 0.5789 - val_loss: 1.7192 - val_acc: 0.3538
Epoch 9/20
20000/20000 [==============================] - 3s 151us/step - loss: 0.1769 - acc: 0.5144 - val_loss: 0.8144 - val_acc: 0.4416
Epoch 10/20
20000/20000 [==============================] - 3s 151us/step - loss: 0.1507 - acc: 0.4365 - val_loss: 1.1555 - val_acc: 0.3682
Epoch 11/20
20000/20000 [==============================] - 3s 152us/step - loss: 0.1395 - acc: 0.3675 - val_loss: 1.1440 - val_acc: 0.3412
Epoch 12/20
20000/20000 [==============================] - 3s 156us/step - loss: 0.1241 - acc: 0.3159 - val_loss: 1.8202 - val_acc: 0.2686
Epoch 13/20
20000/20000 [==============================] - 3s 155us/step - loss: 0.1225 - acc: 0.2756 - val_loss: 1.0667 - val_acc: 0.2944
Epoch 14/20
20000/20000 [==============================] - 3s 152us/step - loss: 0.1183 - acc: 0.2422 - val_loss: 1.1143 - val_acc: 0.2794
Epoch 15/20
20000/20000 [==============================] - 3s 151us/step - loss: 0.1153 - acc: 0.2142 - val_loss: 1.1599 - val_acc: 0.2686
Epoch 16/20
20000/20000 [==============================] - 3s 153us/step - loss: 0.1150 - acc: 0.1930 - val_loss: 1.2467 - val_acc: 0.2544
Epoch 17/20
20000/20000 [==============================] - 3s 151us/step - loss: 0.1145 - acc: 0.1766 - val_loss: 1.1953 - val_acc: 0.2492
Epoch 18/20
20000/20000 [==============================] - 3s 153us/step - loss: 0.1115 - acc: 0.1508 - val_loss: 1.4812 - val_acc: 0.2226
Epoch 19/20
20000/20000 [==============================] - 3s 156us/step - loss: 0.1119 - acc: 0.1355 - val_loss: 1.2690 - val_acc: 0.2288
Epoch 20/20
20000/20000 [==============================] - 3s 155us/step - loss: 0.1127 - acc: 0.1248 - val_loss: 1.2903 - val_acc: 0.2148

我当然可以在验证准确性处于顶部时尽早停止它,但我要了解训练准确性如何急剧下降。更令人惊讶的是,它如何下降到0.5以下,作为一个softmax 0/1类型的输出层,我预计精度在0.5左右。看起来它真的学会了给出错误的答案。

听起来像是爆炸梯度或高方差。在激活(或某种其他类型的正则化(之前尝试批量规范化。

使用一些更简单的模型,如下所示,然后逐步提高:

model = keras.models.Sequential()
model.add(layers.Embedding(max_features, 128,
input_length=max_len,
name='embed'))
model.add(layers.Flatten())
model.add(layers.Dense(512))
model.add(layers.BatchNormalization())
model.add(layers.ReLU())
model.add(layers.Dense(1))
model.add(layers.ReLU())

用那一集看了 70 集后达到了 20%。

另一个问题可能是数据中的示例太少,请尝试获取更多样本(可能通过数据增强(或减少特征。

迈克尔,感谢您对梯度爆炸的指示,我以前从未听说过这个问题。

该网络真正不寻常的是,我对分类问题的准确率为12.5%,有两个答案和平衡样本。如果网络没有学习,我预计会达到50%左右。

休息后回到它,我意识到问题是最后一个 Dense 层缺少激活函数。这对于回归问题来说是可以的,但对于像这样的分类问题则不行。添加激活='sigmoid'后,网络可以正常训练,训练精度不断提高。它显示过度拟合,但这是意料之中的。

简而言之,问题是使用没有激活函数的损失函数binary_crossentropy

最新更新