比较 keras 张量流后端中的模型训练结果



我在两个CNN网络上训练了6000张图像,它们似乎给了我类似的结果。模型经过二元分类训练,两个模型之间的差异在密集层中 模型 A 具有以下密集层配置

fc1 (Dense)                  (None, 512)               12845568  
_________________________________________________________________
fc2 (Dense)                  (None, 256)               131328    
_________________________________________________________________
output (Dense)               (None, 2)                 514 

和结果

Train on 4800 samples, validate on 1200 samples
Epoch 1/30
4800/4800 [===] - 98s - loss: 0.7923 - acc: 0.6865 - val_loss: 0.4599 - val_acc: 0.7858
Epoch 2/30
4800/4800 [===] - 80s - loss: 0.4263 - acc: 0.7996 - val_loss: 0.5913 - val_acc: 0.6350
Epoch 3/30
4800/4800 [===] - 80s - loss: 0.3912 - acc: 0.8133 - val_loss: 0.3199 - val_acc: 0.8625
Epoch 4/30
4800/4800 [===] - 80s - loss: 0.3562 - acc: 0.8402 - val_loss: 0.3086 - val_acc: 0.8708
Epoch 5/30
4800/4800 [===] - 80s - loss: 0.3251 - acc: 0.8558 - val_loss: 0.2784 - val_acc: 0.8817
Epoch 6/30
4800/4800 [===] - 80s - loss: 0.3150 - acc: 0.8631 - val_loss: 0.2792 - val_acc: 0.8817
Epoch 7/30
4800/4800 [===] - 80s - loss: 0.2997 - acc: 0.8692 - val_loss: 0.3615 - val_acc: 0.8342
Epoch 8/30
4800/4800 [===] - 80s - loss: 0.2990 - acc: 0.8662 - val_loss: 0.2630 - val_acc: 0.8908
Epoch 9/30
4800/4800 [===] - 80s - loss: 0.2594 - acc: 0.8867 - val_loss: 0.3102 - val_acc: 0.8700
Epoch 10/30
4800/4800 [===] - 80s - loss: 0.2846 - acc: 0.8785 - val_loss: 0.4234 - val_acc: 0.7842
Epoch 11/30
4800/4800 [===] - 80s - loss: 0.2510 - acc: 0.8969 - val_loss: 0.2952 - val_acc: 0.8742
Epoch 12/30
4800/4800 [===] - 80s - loss: 0.2288 - acc: 0.9090 - val_loss: 0.2680 - val_acc: 0.8858
Epoch 13/30
4800/4800 [===] - 80s - loss: 0.2277 - acc: 0.9044 - val_loss: 0.3745 - val_acc: 0.8600
Epoch 14/30
4800/4800 [===] - 80s - loss: 0.2659 - acc: 0.8873 - val_loss: 0.2438 - val_acc: 0.9025
Epoch 15/30
4800/4800 [===] - 80s - loss: 0.2101 - acc: 0.9133 - val_loss: 0.3176 - val_acc: 0.8667
Epoch 16/30
4800/4800 [===] - 80s - loss: 0.2094 - acc: 0.9146 - val_loss: 0.2763 - val_acc: 0.8875
Epoch 17/30
4800/4800 [===] - 80s - loss: 0.2058 - acc: 0.9125 - val_loss: 0.2677 - val_acc: 0.8925
Epoch 18/30
4800/4800 [===] - 80s - loss: 0.1839 - acc: 0.9296 - val_loss: 0.2449 - val_acc: 0.9117
Epoch 19/30
4800/4800 [===] - 80s - loss: 0.1918 - acc: 0.9221 - val_loss: 0.2471 - val_acc: 0.8992
Epoch 20/30
4800/4800 [===] - 80s - loss: 0.2014 - acc: 0.9225 - val_loss: 0.2709 - val_acc: 0.8808
Epoch 21/30
4800/4800 [===] - 80s - loss: 0.1540 - acc: 0.9425 - val_loss: 0.2541 - val_acc: 0.8933
Epoch 22/30
4800/4800 [===] - 80s - loss: 0.1803 - acc: 0.9294 - val_loss: 0.2289 - val_acc: 0.9058
Epoch 23/30
4800/4800 [===] - 80s - loss: 0.1548 - acc: 0.9425 - val_loss: 0.2417 - val_acc: 0.9175
Epoch 24/30
4800/4800 [===] - 80s - loss: 0.1754 - acc: 0.9294 - val_loss: 0.4914 - val_acc: 0.8183
Epoch 25/30
4800/4800 [===] - 80s - loss: 0.1449 - acc: 0.9419 - val_loss: 0.2281 - val_acc: 0.9125
Epoch 26/30
4800/4800 [===] - 80s - loss: 0.1529 - acc: 0.9385 - val_loss: 0.2328 - val_acc: 0.9217
Epoch 27/30
4800/4800 [===] - 80s - loss: 0.1237 - acc: 0.9533 - val_loss: 0.2646 - val_acc: 0.9167
Epoch 28/30
4800/4800 [===] - 80s - loss: 0.1236 - acc: 0.9531 - val_loss: 0.2485 - val_acc: 0.9100
Epoch 29/30
4800/4800 [===] - 80s - loss: 0.1301 - acc: 0.9500 - val_loss: 0.2726 - val_acc: 0.9042
Epoch 30/30
4800/4800 [===] - 80s - loss: 0.1335 - acc: 0.9500 - val_loss: 0.2803 - val_acc: 0.9183
Training time: 2440.315860271454
1200/1200 [===] - 27s    
[INFO] loss=0.2803, accuracy: 91.8333%
=================================================================
Model B has following final dense layer config
fc1 (Dense)                  (None, 1024)              25691136  
_________________________________________________________________
fc2 (Dense)                  (None, 512)               524800    
_________________________________________________________________
output (Dense)               (None, 2)                 1026      

Result
Train on 4800 samples, validate on 1200 samples
Epoch 1/30
4800/4800 [===] - 87s - loss: 0.4743 - acc: 0.7708 - val_loss: 0.4073 - val_acc: 0.8233
Epoch 2/30
4800/4800 [===] - 87s - loss: 0.3732 - acc: 0.8263 - val_loss: 0.3359 - val_acc: 0.8525
Epoch 3/30
4800/4800 [===] - 87s - loss: 0.3383 - acc: 0.8500 - val_loss: 0.3017 - val_acc: 0.8658
Epoch 4/30
4800/4800 [===] - 87s - loss: 0.3094 - acc: 0.8637 - val_loss: 0.3024 - val_acc: 0.8683
Epoch 5/30
4800/4800 [===] - 87s - loss: 0.3036 - acc: 0.8669 - val_loss: 0.3848 - val_acc: 0.8058
Epoch 6/30
4800/4800 [===] - 87s - loss: 0.2848 - acc: 0.8802 - val_loss: 0.2730 - val_acc: 0.8883
Epoch 7/30
4800/4800 [===] - 87s - loss: 0.2630 - acc: 0.8877 - val_loss: 0.3234 - val_acc: 0.8667
Epoch 8/30
4800/4800 [===] - 87s - loss: 0.2491 - acc: 0.8952 - val_loss: 0.2758 - val_acc: 0.8933
Epoch 9/30
4800/4800 [===] - 87s - loss: 0.2484 - acc: 0.8992 - val_loss: 0.3271 - val_acc: 0.8467
Epoch 10/30
4800/4800 [===] - 87s - loss: 0.2427 - acc: 0.8992 - val_loss: 0.2743 - val_acc: 0.8808
Epoch 11/30
4800/4800 [===] - 87s - loss: 0.2346 - acc: 0.9017 - val_loss: 0.2379 - val_acc: 0.9008
Epoch 12/30
4800/4800 [===] - 87s - loss: 0.2250 - acc: 0.9108 - val_loss: 0.2432 - val_acc: 0.9017
Epoch 13/30
4800/4800 [===] - 87s - loss: 0.1993 - acc: 0.9221 - val_loss: 0.2892 - val_acc: 0.8858
Epoch 14/30
4800/4800 [===] - 87s - loss: 0.2148 - acc: 0.9125 - val_loss: 0.3201 - val_acc: 0.8842
Epoch 15/30
4800/4800 [===] - 87s - loss: 0.1823 - acc: 0.9287 - val_loss: 0.5481 - val_acc: 0.8133
Epoch 16/30
4800/4800 [===] - 87s - loss: 0.1873 - acc: 0.9281 - val_loss: 0.2449 - val_acc: 0.9092
Epoch 17/30
4800/4800 [===] - 87s - loss: 0.1622 - acc: 0.9392 - val_loss: 0.2373 - val_acc: 0.9092
Epoch 18/30
4800/4800 [===] - 87s - loss: 0.1782 - acc: 0.9304 - val_loss: 0.2856 - val_acc: 0.8725
Epoch 19/30
4800/4800 [===] - 87s - loss: 0.1632 - acc: 0.9369 - val_loss: 0.2518 - val_acc: 0.9067
Epoch 20/30
4800/4800 [===] - 87s - loss: 0.1577 - acc: 0.9381 - val_loss: 0.2629 - val_acc: 0.9050
Epoch 21/30
4800/4800 [===] - 87s - loss: 0.1395 - acc: 0.9481 - val_loss: 0.2278 - val_acc: 0.9133
Epoch 22/30
4800/4800 [===] - 87s - loss: 0.1422 - acc: 0.9444 - val_loss: 0.2232 - val_acc: 0.9158
Epoch 23/30
4800/4800 [===] - 87s - loss: 0.1436 - acc: 0.9448 - val_loss: 0.2862 - val_acc: 0.9042
Epoch 24/30
4800/4800 [===] - 87s - loss: 0.1402 - acc: 0.9448 - val_loss: 0.3186 - val_acc: 0.8842
Epoch 25/30
4800/4800 [===] - 86s - loss: 0.1261 - acc: 0.9542 - val_loss: 0.2762 - val_acc: 0.9092
Epoch 26/30
4800/4800 [===] - 86s - loss: 0.1143 - acc: 0.9529 - val_loss: 0.2442 - val_acc: 0.9125
Epoch 27/30
4800/4800 [===] - 86s - loss: 0.1141 - acc: 0.9565 - val_loss: 0.3128 - val_acc: 0.8658
Epoch 28/30
4800/4800 [===] - 86s - loss: 0.1092 - acc: 0.9606 - val_loss: 0.2669 - val_acc: 0.9092
Epoch 29/30
4800/4800 [===] - 86s - loss: 0.0939 - acc: 0.9642 - val_loss: 0.2535 - val_acc: 0.8975
Epoch 30/30
4800/4800 [===] - 86s - loss: 0.1098 - acc: 0.9615 - val_loss: 0.2594 - val_acc: 0.9008
Training time: 2615.465226173401
1200/1200 [==============================] - 30s    
[INFO] loss=0.2594, accuracy: 90.0833%

两种模型似乎给出了相似的结果。这是一个好的结果还是我无法检测到的任何异常?还是模型是一个好的模型? 附加信息 批量大小 128,损失=分类交叉熵,优化器-adadelta 任何改进建议也值得赞赏

我要考虑的下一步是:

  • 您的训练损失会发生什么。这可能是一个有用的指标。如果您看到您的训练损失达到 100%,请尝试添加更多正则化,例如 L2 正则化。

  • 您的卷积网络是否使用残差层和批量归一化?那些使用批量范数的残差网络现在在大多数情况下似乎是最先进的。

  • 在每个卷积层中测试不同数量的过滤器。如果你有太多的过滤器,你就会过度拟合,如果你有太少的你欠拟合。那里有一个甜蜜点,它有所不同。

  • 您的批量大小
  • 也会影响您的结果,玩它,我已经对小型数据集与大型数据集进行了广泛的超参数搜索,并提出了截然不同的理想批量大小

  • 您可以测试 Adam 优化器。不过,AdaDelta也很稳固。

  • 训练多个模型,随机初始化会产生略有不同的最终结果。使用模型集合会做得更好。

  • 使用 Xavier 初始值设定项随机初始化可能会给您带来精度的小幅提升(conv 层和 fc 层有不同的

    )。
  • 验证误差平台后降低学习率,这通常会在每次降低时提高一些性能。

你可以尝试很多事情。开始运行试验,使用其中一些建议的不同值训练一些模型,看看在哪些方面可以改进。

最新更新