TensorFlow二进制交叉熵损失很快达到NaN



TL;当使用新数据重新训练时,DR-ML模型损失会迅速达到NaN。所有的";标准";解决方案不起作用

你好,

最近,我(成功地(训练了一个CNN/密集分层模型,使其能够对声谱图(音频的图像表示(进行分类

然而,由于某种原因,二进制交叉熵损失函数稳步下降,直到1.000左右,并突然变为"0";NaN";在第一个时期内。我试着把学习率降低到1e-8,在整个过程中使用ReLu,在最后一层使用sigmoid,但似乎什么都不起作用。即使将网络简化为只有密集层,这个问题仍然会发生。虽然我已经手动规范了我的数据,但我很有信心我做得很好,所以我的所有数据都在[0,1]之间。这里可能有一个洞,但我认为这不太可能。

我在这里附上了我的模型架构代码:

input_shape = (125, 128, 1)
model = models.Sequential([

layers.Conv2D(16, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.001), input_shape=input_shape),
layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
layers.BatchNormalization(),
layers.Conv2D(16, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
layers.MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
layers.BatchNormalization(),

layers.Conv2D(16, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
layers.BatchNormalization(),

layers.Conv2D(16, (2, 2), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
layers.MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
layers.BatchNormalization(),

layers.Conv2D(16, (2, 2), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
layers.BatchNormalization(),

layers.Conv2D(16, (2, 2), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
layers.MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
layers.BatchNormalization(),

layers.Dropout(0.3),

layers.Flatten(),
layers.Dense(512, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
layers.Dense(256, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
layers.Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
layers.Dropout(0.5),
layers.Dense(1, activation='sigmoid')

])

有趣的是,我尝试使用这个新数据来微调VGG16模型,结果成功了!(不存在丢失NaN问题。(我在这里附上了代码,但我真的不知道哪里/是否有任何差异导致了问题:

base_model = keras.applications.VGG16(
weights="imagenet", 
input_shape=(125, 128, 3),
include_top=False,
) 
# Freeze the base_model
base_model.trainable = False
# Create new model on top
inputs = keras.Input(shape=(125, 128, 3))
x = inputs
x = base_model(x, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dense(256, activation='relu', kernel_regularizer=regularizers.l2(0.001))(x)
x = keras.layers.Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.001))(x)
x = keras.layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001))(x)
x = keras.layers.Dropout(0.5)(x)  # Regularize with dropout
outputs = keras.layers.Dense(1, activation='sigmoid')(x)
model = keras.Model(inputs, outputs)
model.summary()

我想我已经经历了所有的";书;解决方案,但似乎仍然找不到问题的根源。任何帮助都将不胜感激。

发现我的一些输入数据有问题(在标准化过程中被零除错误。(很抱歉给您带来了麻烦,感谢您的帮助。

Convolution layers中删除所有不需要的kernel_regularizersBatchNormalizationdropout
在模型定义中仅在Dense层中保留kernel_regularizersDropout,并在Conv2d层中更改number of kernels

并再次尝试使用以下代码训练您的模型:

model = Sequential([
Rescaling(1./255, input_shape=(img_h,img_w,3)),
Conv2D(16, (3, 3), activation='relu'),#, kernel_regularizer=regularizers.l2(0.001)),
MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
#BatchNormalization(),
Conv2D(16, (3, 3), activation='relu'),#, kernel_regularizer=regularizers.l2(0.001)),
MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
#BatchNormalization(),

Conv2D(32, (3, 3), activation='relu'),#, kernel_regularizer=regularizers.l2(0.001)),
MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
#BatchNormalization(),

Conv2D(32, (2, 2), activation='relu'),#, kernel_regularizer=regularizers.l2(0.001)),
MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
#BatchNormalization(),

Conv2D(64, (2, 2), activation='relu'),#, kernel_regularizer=regularizers.l2(0.001)),
MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
#BatchNormalization(),

Conv2D(64, (2, 2), activation='relu'),#, kernel_regularizer=regularizers.l2(0.001)),
MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
#BatchNormalization(),

#Dropout(0.3),

Flatten(),
Dense(512, activation='relu'), kernel_regularizer=regularizers.l2(0.001)),
Dense(256, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
Dropout(0.5),
Dense(1, activation='sigmoid')

])
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=optimizer,loss='binary_crossentropy',metrics=['accuracy'])

model.fit(...

输出:

Epoch 1/20
63/63 [==============================] - 9s 97ms/step - loss: 1.0032 - accuracy: 0.5035 - val_loss: 0.8219 - val_accuracy: 0.6160
Epoch 2/20
63/63 [==============================] - 6s 88ms/step - loss: 0.7575 - accuracy: 0.5755 - val_loss: 0.7256 - val_accuracy: 0.6120
Epoch 3/20
63/63 [==============================] - 6s 88ms/step - loss: 0.7181 - accuracy: 0.5805 - val_loss: 0.6917 - val_accuracy: 0.6360
Epoch 4/20
63/63 [==============================] - 6s 88ms/step - loss: 0.6749 - accuracy: 0.6190 - val_loss: 0.6671 - val_accuracy: 0.6300
Epoch 5/20
63/63 [==============================] - 6s 95ms/step - loss: 0.6571 - accuracy: 0.6500 - val_loss: 0.6850 - val_accuracy: 0.5980
Epoch 6/20
63/63 [==============================] - 5s 80ms/step - loss: 0.6319 - accuracy: 0.6720 - val_loss: 0.6243 - val_accuracy: 0.6730
Epoch 7/20
63/63 [==============================] - 6s 90ms/step - loss: 0.5923 - accuracy: 0.6935 - val_loss: 0.6144 - val_accuracy: 0.7120
Epoch 8/20
63/63 [==============================] - 6s 89ms/step - loss: 0.5643 - accuracy: 0.7205 - val_loss: 0.6136 - val_accuracy: 0.6700
Epoch 9/20
63/63 [==============================] - 6s 93ms/step - loss: 0.5552 - accuracy: 0.7380 - val_loss: 0.5669 - val_accuracy: 0.7080
Epoch 10/20
63/63 [==============================] - 4s 58ms/step - loss: 0.5423 - accuracy: 0.7400 - val_loss: 0.5819 - val_accuracy: 0.7120
Epoch 11/20
63/63 [==============================] - 4s 57ms/step - loss: 0.4905 - accuracy: 0.7745 - val_loss: 0.6146 - val_accuracy: 0.7020
Epoch 12/20
63/63 [==============================] - 4s 57ms/step - loss: 0.4808 - accuracy: 0.7900 - val_loss: 0.6318 - val_accuracy: 0.7070
Epoch 13/20
63/63 [==============================] - 4s 60ms/step - loss: 0.4602 - accuracy: 0.7990 - val_loss: 0.5707 - val_accuracy: 0.7160
Epoch 14/20
63/63 [==============================] - 4s 61ms/step - loss: 0.4291 - accuracy: 0.8190 - val_loss: 0.6392 - val_accuracy: 0.6910
Epoch 15/20
63/63 [==============================] - 5s 69ms/step - loss: 0.4003 - accuracy: 0.8355 - val_loss: 0.7048 - val_accuracy: 0.7110
Epoch 16/20
63/63 [==============================] - 4s 58ms/step - loss: 0.3658 - accuracy: 0.8430 - val_loss: 0.8027 - val_accuracy: 0.7180
Epoch 17/20
63/63 [==============================] - 4s 58ms/step - loss: 0.3069 - accuracy: 0.8750 - val_loss: 0.9428 - val_accuracy: 0.6970
Epoch 18/20
63/63 [==============================] - 4s 59ms/step - loss: 0.2601 - accuracy: 0.9005 - val_loss: 0.9420 - val_accuracy: 0.7170
Epoch 19/20
63/63 [==============================] - 4s 60ms/step - loss: 0.2061 - accuracy: 0.9230 - val_loss: 0.9134 - val_accuracy: 0.7290
Epoch 20/20
63/63 [==============================] - 4s 62ms/step - loss: 0.1770 - accuracy: 0.9330 - val_loss: 1.0805 - val_accuracy: 0.6930

最新更新