我目前正在为我的CNN而苦苦挣扎。我使用categorial_crossentropy,我将添加我的模型。累计费用既不增加也不减少损失。标记的数据量现在是 600,这是相当小的,但对我来说没有任何变化似乎很奇怪。
### Define architecture.
model.add(Conv2D(32, 4, strides=(11,11),padding="same",input_shape=(200,200,3), activation="relu"))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Conv2D(64, 4, strides=(9,9),padding="same", activation="relu"))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Conv2D(128, 4, strides=(5,5),padding="same", activation="relu"))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(GlobalMaxPooling2D())
model.add(Dense(128, activation="relu"))
model.add(Dense(y_test.shape[1], activation="sigmoid"))
model.summary()
sgd = optimizers.SGD(lr=0.1,) #0.1
model.compile(loss='categorical_crossentropy', optimizer='sgd',
metrics=['accuracy'])
model1 = model.fit(x_train, y_train,batch_size=32, epochs=10, verbose=1)
Epoch 1/10
420/420 [==============================] - 5s 11ms/step - loss: 1.4598 - acc: 0.2381
Epoch 2/10
420/420 [==============================] - 1s 3ms/step - loss: 1.4679 - acc: 0.2333
Epoch 3/10
420/420 [==============================] - 1s 3ms/step - loss: 1.4335 - acc: 0.2667
Epoch 4/10
420/420 [==============================] - 1s 3ms/step - loss: 1.4198 - acc: 0.2310
Epoch 5/10
420/420 [==============================] - 1s 3ms/step - loss: 1.4038 - acc: 0.2524
Epoch 6/10
420/420 [==============================] - 1s 3ms/step - loss: 1.4343 - acc: 0.2643
Epoch 7/10
420/420 [==============================] - 1s 3ms/step - loss: 1.4281 - acc: 0.2786
Epoch 8/10
420/420 [==============================] - 1s 3ms/step - loss: 1.4097 - acc: 0.2333
Epoch 9/10
420/420 [==============================] - 1s 3ms/step - loss: 1.4071 - acc: 0.2714
Epoch 10/10
420/420 [==============================] - 1s 3ms/step - loss: 1.4135 - acc: 0.2476
我的模型有问题吗?我尝试更改 lr,图片的大小,尝试简化模型,更改内核大小,让它运行更多纪元(最多 60 个(并打印x_test的预测。这个预测似乎也是错误的:
error = model.predict(x_test)
print(error)
[[0.49998534 0.49998534 0.4999715 0.50000155]
[0.49998188 0.49998283 0.49997032 0.5000029 ]
[0.49998188 0.4999858 0.49998164 0.5000036 ]
[0.4999795 0.49998736 0.4999841 0.5000008 ]
[0.49998784 0.49997187 0.49996948 0.5000013 ]
[0.49997532 0.49997967 0.49997616 0.50000024]
非常感谢各种帮助!谢谢!
我可以根据我的经验向您推荐几件事供您尝试:
- 由于您使用的是分类交叉熵,因此您可以尝试将"softmax"作为激活函数,而不是最后一层的"sigmoid"。
- 你应该降低你的学习率。(此处建议使用新设置(
- 您可以尝试使用不同的优化器,例如"adam"而不是"sgd"。
- 您可以删除辍学和批量归一化图层,并仅在必要时添加它们。
- 使内核尺寸为 [2x2] 而不是 1。也许将内核大小从 4 更改为 (3x3(。还要减小步幅,也许您可以从 (1,1( 开始。在尺寸为 [200x200] 且步幅为 (11,11( 的图像上使用大小为 4 的内核几乎等于"什么都不学"。
请先尝试最终建议,因为这似乎是主要问题。我希望其中之一对您有所帮助。
请尝试以下设置:
- 将步幅减少到 1*1 或 2*2 或最大 3*3
- 删除卷积层之间的辍学,如有必要,仅在密集层之前使用辍学
- 尝试添加池化层,最好是卷积层之后的步幅为 2*2,内核大小为 2*2。
- 更改优化到亚当/纳达姆
- 使用 softmax 而不是 sigmoid
- 增加周期数,10太低。
以上所有要点都可以根据问题而有所不同,您可以尝试一下,并相应地修改模型。
由于您使用的步幅,您似乎丢失了前两层图像中几乎所有的空间信息。
您的model.summary()
显示问题:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 19, 19, 32) 1568
_________________________________________________________________
dropout_1 (Dropout) (None, 19, 19, 32) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 19, 19, 32) 128
_________________________________________________________________
conv2d_2 (Conv2D) (None, 3, 3, 64) 32832
_________________________________________________________________
dropout_2 (Dropout) (None, 3, 3, 64) 0
_________________________________________________________________
batch_normalization_2 (Batch (None, 3, 3, 64) 256
_________________________________________________________________
conv2d_3 (Conv2D) (None, 1, 1, 128) 131200
_________________________________________________________________
dropout_3 (Dropout) (None, 1, 1, 128) 0
_________________________________________________________________
batch_normalization_3 (Batch (None, 1, 1, 128) 512
_________________________________________________________________
global_max_pooling2d_1 (Glob (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 128) 16512
_________________________________________________________________
dense_2 (Dense) (None, 4) 516
=================================================================
Total params: 183,524
Trainable params: 183,076
Non-trainable params: 448
你看到的是张量大小立即从原始图像中的 200 下降到第一次卷积后的 19,再到第二次卷积后的 3。 我们期望尺寸会逐渐减小,以便真正利用卷积层的优势。
如果你保持代码不变,并将所有步骤更改为(2, 2)
,你将得到一个更合理的结构:
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 100, 100, 32) 1568
_________________________________________________________________
dropout_1 (Dropout) (None, 100, 100, 32) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 100, 100, 32) 128
_________________________________________________________________
conv2d_2 (Conv2D) (None, 50, 50, 64) 32832
_________________________________________________________________
dropout_2 (Dropout) (None, 50, 50, 64) 0
_________________________________________________________________
batch_normalization_2 (Batch (None, 50, 50, 64) 256
_________________________________________________________________
conv2d_3 (Conv2D) (None, 25, 25, 128) 131200
_________________________________________________________________
dropout_3 (Dropout) (None, 25, 25, 128) 0
_________________________________________________________________
batch_normalization_3 (Batch (None, 25, 25, 128) 512
_________________________________________________________________
global_max_pooling2d_1 (Glob (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 128) 16512
_________________________________________________________________
dense_2 (Dense) (None, 4) 516
=================================================================
Total params: 183,524
Trainable params: 183,076
Non-trainable params: 448
_________________________________________________________________