为什么我的val_accurcy停滞在0.0000e+00,而我的val_loss从一开始就在增加



我正在训练一个分类模型来对细胞进行分类,我的模型基于这篇论文:https://www.nature.com/articles/s41598-019-50010-9.由于我的数据集只有10张图像,我进行了图像增强,以人为地将数据集的大小增加到3000张图像,然后将其拆分为2400张训练图像和600张验证图像。

然而,尽管训练损失和准确性在更多迭代后有所改善,但验证损失迅速增加,而验证准确性仍停留在0.0000e+00。

我的模型从一开始就严重拟合吗?

我使用的代码如下所示:

import keras
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.models import Model, load_model, Sequential, model_from_json, load_model
from tensorflow.keras.layers import Input, BatchNormalization, Activation, Flatten, Dense, LeakyReLU
from tensorflow.python.keras.layers.core import Lambda, Dropout
from tensorflow.python.keras.layers.convolutional import Conv2D, Conv2DTranspose, UpSampling2D
from tensorflow.python.keras.layers.pooling import MaxPooling2D, AveragePooling2D
from tensorflow.python.keras.layers.merge import Concatenate, Add
from tensorflow.keras import backend as K
from tensorflow.keras.callbacks import ModelCheckpoint, LearningRateScheduler, ReduceLROnPlateau, EarlyStopping
from tensorflow.keras.optimizers import *
img_channel = 1
input_size = (512, 512, 1)
inputs = Input(shape = input_size)
initial_input = Lambda(lambda x: x) (inputs) #Ensure input value is between 0 and 1 to avoid negative loss
kernel_size = (3,3)
pad = 'same'
model = Sequential()
filters = 2
c1 = Conv2D(filters, kernel_size, padding = pad, kernel_initializer = 'he_normal')(initial_input)
b1 = BatchNormalization()(c1) 
a1 = Activation('elu')(b1)
p1 = AveragePooling2D()(a1)
c2 = Conv2D(filters, kernel_size, padding = pad, kernel_initializer = 'he_normal')(p1)
b2 = BatchNormalization()(c2) 
a2 = Activation('elu')(b2)
p2 = AveragePooling2D()(a2)
c3 = Conv2D(filters, kernel_size, padding = pad, kernel_initializer = 'he_normal')(p2)
b3 = BatchNormalization()(c3) 
a3 = Activation('elu')(b3)
p3 = AveragePooling2D()(a3)
c4 = Conv2D(filters, kernel_size, padding = pad, kernel_initializer = 'he_normal')(p3)
b4 = BatchNormalization()(c4) 
a4 = Activation('elu')(b4)
p4 = AveragePooling2D()(a4)
c5 = Conv2D(filters, kernel_size, padding = pad, kernel_initializer = 'he_normal')(p4)
b5 = BatchNormalization()(c5) 
a5 = Activation('elu')(b5)
p5 = AveragePooling2D()(a5)
f = Flatten()(p5)
d1 = Dense(128, activation = 'elu')(f)
d2 = Dense(no_of_img, activation = 'softmax')(d1)
model = Model(inputs = [inputs], outputs = [d2])
print(model.summary())
learning_rate = 0.001
decay_rate = 0.0001
model.compile(optimizer = SGD(lr = learning_rate, decay = decay_rate, momentum = 0.9, nesterov = False), 
loss = 'categorical_crossentropy', metrics = ['accuracy'])
perf_lr_scheduler = ReduceLROnPlateau(monitor = 'val_loss', factor = 0.9, patience = 3,
verbose = 1, min_delta = 0.01, min_lr = 0.000001)
model_earlystop = EarlyStopping(monitor = 'val_loss', min_delta = 0.001, patience = 10, restore_best_weights = True) 
#Convert labels to binary matrics
img_aug_label = to_categorical(img_aug_label, num_classes = no_of_img)
#Convert images to float to between 0 and 1
img_aug = np.float32(img_aug)/255
plt.imshow(img_aug[0,:,:,0])
plt.show()
#Train on augmented images
model.fit(
img_aug, 
img_aug_label, 
batch_size = 4,
epochs = 100, 
validation_split = 0.2,
shuffle = True,
callbacks = [perf_lr_scheduler], 
verbose = 2)

我的模型的输出如下所示:

Train on 2400 samples, validate on 600 samples
Epoch 1/100
2400/2400 - 12s - loss: 0.6474 - accuracy: 0.8071 - val_loss: 9.8161 - val_accuracy: 0.0000e+00
Epoch 2/100
2400/2400 - 10s - loss: 0.0306 - accuracy: 0.9921 - val_loss: 10.1733 - val_accuracy: 0.0000e+00
Epoch 3/100
2400/2400 - 10s - loss: 0.0058 - accuracy: 0.9996 - val_loss: 10.9820 - val_accuracy: 0.0000e+00
Epoch 4/100
Epoch 00004: ReduceLROnPlateau reducing learning rate to 0.0009000000427477062.
2400/2400 - 10s - loss: 0.0019 - accuracy: 1.0000 - val_loss: 11.3029 - val_accuracy: 0.0000e+00
Epoch 5/100
2400/2400 - 10s - loss: 0.0042 - accuracy: 0.9992 - val_loss: 11.9037 - val_accuracy: 0.0000e+00
Epoch 6/100
2400/2400 - 10s - loss: 0.0024 - accuracy: 0.9996 - val_loss: 11.5218 - val_accuracy: 0.0000e+00
Epoch 7/100
Epoch 00007: ReduceLROnPlateau reducing learning rate to 0.0008100000384729356.
2400/2400 - 10s - loss: 9.9053e-04 - accuracy: 1.0000 - val_loss: 11.7658 - val_accuracy: 0.0000e+00
Epoch 8/100
2400/2400 - 10s - loss: 0.0011 - accuracy: 1.0000 - val_loss: 12.0437 - val_accuracy: 0.0000e+00
Epoch 9/100

我意识到发生了错误,因为在将数据用作模型的训练数据之前,我没有手动打乱数据。我原以为validation_split和shuffle参数只会在训练期间发生,但事实上,这发生在训练之前。换言之,拟合函数将首先将数据拆分为训练集和验证集,然后对每个集中的数据进行混洗(但不是跨集(。

对于我的增强数据集,分割发生在验证集包含训练集中找不到的图像类型的位置。因此,该模型对训练集中没有看到的数据类型进行了验证,导致验证损失和准确性较差。在将数据拟合到模型中之前手动搅乱数据解决了这个问题。

最新更新