为什么我的神经网络预测属于一个类别的测试图像的错误类别标签,尽管具有很高的验证精度



我使用Inception v4模型在a、B和C三个类上训练分类器,每个类在训练数据集中大约有900个图像,在验证集中有80个图像。我运行了200个时期的训练代码,批量大小为8。我的平均验证准确率超过99%,损失非常低:-

Epoch 199/200
303/303 [==============================] - 53s 174ms/step - loss: 0.0026 - accuracy: 0.9996 - val_loss: 5.1226e-04 - val_accuracy: 1.0000
Epoch 200/200
303/303 [==============================] - 53s 176ms/step - loss: 0.0019 - accuracy: 1.0000 - val_loss: 0.1079 - val_accuracy: 0.9750

当我对验证集目录A中的图像运行测试代码时,它预测80%的图像为A类,20%为C类,而B类中什么都没有。与C类相同(80%为C,20%为A(。在目录B上,所有图像都被预测为A类或C类。在所有三种测试情况下,测试程序没有将一张图像归类为B类,尽管验证精度很高,并且使用了与训练时用于验证的目录完全相同的目录(后者也让我相信这主要不是由过拟合引起的(。

这是目录B:上测试程序的输出

25/25 [==============================] - 8s 186ms/step - loss: 0.0212 - accuracy: 0.9963
['loss', 'accuracy']
[0.02124088630080223, 0.9963099360466003]
Testing images located in val/B/
[[6.2504888e-01 8.8258091e-08 3.7495103e-01]]
A:62.5%
[[8.8602149e-01 1.3459101e-05 1.1396510e-01]]
A:88.6%
[[4.7189465e-01 4.0863368e-05 5.2806443e-01]]
C:52.81%
[[1.0370950e-01 2.7608112e-07 8.9629024e-01]]
C:89.63%
[[7.1212035e-01 3.3269991e-06 2.8787634e-01]]
A:71.21%

等等

我甚至试着把img = np.expand_dims(test_image, axis=0)除以255,正如我在其他地方问的另一个问题所描述的那样。在这种情况下它是成功的,但在这里却不那么成功。

这是我的培训代码:

def create_inception_v4(nb_classes, load_weights, checkpoint_path):
init = Input((299,299, 3))
x = inception_stem(init)
# 4 x Inception A
for i in range(4):
x = inception_A(x)
# Reduction A
x = reduction_A(x)
# 7 x Inception B
for i in range(7):
x = inception_B(x)
# Reduction B
x = reduction_B(x)
# 3 x Inception C
for i in range(3):
x = inception_C(x)
# Average Pooling
x = AveragePooling2D((8, 8))(x)
# Dropout - Use 0.2, as mentioned in official paper. 
x = Dropout(0.2)(x)
x = Flatten()(x)
# Output
out = Dense(nb_classes, activation='softmax')(x)
model = Model(init, out, name='Inception-v4')
if load_weights:
weights = checkpoint_path
model.load_weights(weights, by_name=True)
print("Model weights loaded.")

return model


def train(args,check,checkpoint_path,network_name="inceptionv4"):
n_gpus=int(args['gpus'])      

sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(log_device_placement=True))
datagen=ImageDataGenerator(rescale=1/255,
rotation_range=40,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.1,
zoom_range=0.1,
horizontal_flip=True,
fill_mode='nearest',
samplewise_std_normalization=True)
val_datagen = ImageDataGenerator(rescale=1/255)
batch_size = int(args["batch_size"])
train_generator = datagen.flow_from_directory(train_dir,target_size=(299,299),class_mode="categorical", batch_size=batch_size)
val_gen = datagen.flow_from_directory(val_dir,target_size=(299,299),class_mode="categorical", batch_size=batch_size)
mc = keras.callbacks.ModelCheckpoint(f"{network_name}_checkpoints/{network_name}.h5", save_weights_only=True, save_best_only=True)
tensorboard = TensorBoard(log_dir="{}/{}".format(args["log_dir"], time()))
validation_steps = 10

model = create_inception_v4(int(args["num_classes"]),check,checkpoint_path)
model.compile(loss='categorical_crossentropy', optimizer=tf.keras.optimizers.SGD(learning_rate=float(args['learning_rate']), decay=1e-6, momentum=0.9, nesterov=True), metrics=["accuracy"])   
counter = Counter(train_generator.classes)                          
max_val = float(max(counter.values()))       
class_weights = {class_id : max_val/num_images for class_id, num_images in counter.items()}    

hist = model.fit(train_generator,epochs=num_epochs,verbose=True,validation_data=val_gen,validation_steps=validation_steps,callbacks=[mc, tensorboard], class_weight=class_weights)
model.save(f"checkpoints/{network_name}_{num_epochs}epochs.h5")

这是我的测试代码:

def test_model(test_dir, num_epochs,class_names, network_name="inceptionv4",):
model=load_model(f'checkpoints/{network_name}_{num_epochs}epochs.h5')

datagen=ImageDataGenerator(rescale=1/255,
rotation_range=40,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.1,
zoom_range=0.1,
horizontal_flip=True,
fill_mode='nearest',         
samplewise_std_normalization=True)
val_datagen = ImageDataGenerator(rescale=1/255)
val_dir = "val/"
val_gen = datagen.flow_from_directory(val_dir,target_size=(299,299),class_mode="categorical")

test_accuracy=model.evaluate(val_gen,steps=25)
print(model.metrics_names)
print(test_accuracy)

img_width, img_height = 299, 299
print(f"Testing images located in {test_dir}")
counter = 0
results_dict = {}
start_time = time.time()

for filename_img in os.listdir(test_dir):
counter += 1
filename = os.path.join(test_dir,filename_img)
img = image.load_img(filename, target_size=(img_width, img_height))
test_image = image.img_to_array(img)
test_image.shape
img = np.expand_dims(test_image, axis=0)/255
classes = model.predict(img, batch_size=10)
print(classes)
predicted_class = class_names[np.argmax(classes)]
if predicted_class not in results_dict.keys():
results_dict[predicted_class] = 1
else:
results_dict[predicted_class] += 1
print(f"{predicted_class}:{round(np.amax(classes)*100,2)}%")
if counter % 100 == 0:
print(f"{counter} files processed!")
time_taken = time.time() - start_time
time_taken = round(time_taken,2)
print(f"{counter} images processed in {time_taken} seconds, at a rate of {round(counter/time_taken,2)} images per second.")

for predicted_class in results_dict.keys():
print(f"{predicted_class} = {results_dict[predicted_class]} predictions")

我做错了什么?

编辑1-我试图通过添加class_weight参数来解释不平衡的类,如编辑的代码中所示。仍然无法预测B类。我甚至尝试使用val_datagen而不是datagen,结果更糟。

编辑2-现在我把我的整个文件夹复制到其他地方,然后删除了类B,保留了类A和C。我训练了模型,再次获得了非常高的训练精度,现在我的测试程序只能预测类C,而不能预测类A。我有一种感觉,我在test.py代码中犯了一个非常愚蠢的错误。

这是一个非常令人沮丧的错误。我意识到我在整个目录的model.evaluate()上获得了很高的验证精度,但在单个图像上的model.predict()却不是这样。这是因为用于训练的图像增强技术也用于验证,但不用于作为模型输入的单个图像。

在这种情况下,我意识到samplewise_std_normalization没有应用于测试图像。因此,受这个答案的启发,我使用了标准化函数test_image = datagen.standardize(test_image),现在我的模型工作得很好。完整的test.py代码如下所示:

def test_model(test_dir, num_epochs,class_names, network_name="inceptionv4",):
model=load_model(f'checkpoints/{network_name}_{num_epochs}epochs.h5')

datagen=ImageDataGenerator(rescale=1/255,
rotation_range=40,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.1,
zoom_range=0.1,
horizontal_flip=True,
fill_mode='nearest',         
samplewise_std_normalization=True)
val_datagen = ImageDataGenerator(rescale=1/255)
val_dir = "val/"
val_gen = datagen.flow_from_directory(val_dir,target_size=(299,299),class_mode="categorical")

test_accuracy=model.evaluate(val_gen,steps=25)
print(model.metrics_names)
print(test_accuracy)

img_width, img_height = 299, 299
print(f"Testing images located in {test_dir}")
counter = 0
results_dict = {}
start_time = time.time()

for filename_img in os.listdir(test_dir):
counter += 1
filename = os.path.join(test_dir,filename_img)
img = image.load_img(filename, target_size=(img_width, img_height))
test_image = image.img_to_array(img)
test_image = np.expand_dims(test_image, axis=0)  
# Don't divide by 255, this is taken care of by the standardize function
test_image = datagen.standardize(test_image)
classes = model.predict(test_image, batch_size=10)
print(classes)
predicted_class = class_names[np.argmax(classes)]
if predicted_class not in results_dict.keys():
results_dict[predicted_class] = 1
else:
results_dict[predicted_class] += 1
print(f"{predicted_class}:{round(np.amax(classes)*100,2)}%")
if counter % 100 == 0:
print(f"{counter} files processed!")
time_taken = time.time() - start_time
time_taken = round(time_taken,2)
print(f"{counter} images processed in {time_taken} seconds, at a rate of {round(counter/time_taken,2)} images per second.")

for predicted_class in results_dict.keys():
print(f"{predicted_class} = {results_dict[predicted_class]} predictions")

最新更新