我们目前正在进行一项图像分类任务,用于从胸部x射线图像中检测结核病。您可以在下面看到我们的代码。我们使用了7000张图像(每张3500张(,并使用以下分数对其进行划分:训练集为0.64,验证集为0.16,测试集为0.2。我们的培训和验证损失也很大1。但是,当我们将我们的模型用于测试集时,混淆矩阵没有意义2。我们的代码有问题吗?提前谢谢。
#Imports
from tensorflow import keras
from keras.applications.mobilenet_v2 import MobileNetV2
from keras.applications.mobilenet_v2 import preprocess_input
from keras.layers import Dense
from keras.models import Model, Sequential
from keras.losses import BinaryCrossentropy
from keras.optimizer_v2.adam import Adam
from keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import confusion_matrix, classification_report
import numpy as np
#importing the model
image_size = 224
base_model = MobileNetV2(input_shape=(image_size,image_size,3),
weights='imagenet',
include_top=True)
#freezing the base model
for layer in base_model.layers:
layer.trainable = False
#adding a softmax layer with 2 outputs
y_layer = base_model.get_layer('global_average_pooling2d').output
z_layer = Dense(2, activation='softmax')(y_layer)
model = Model(inputs=base_model.input, outputs=z_layer)
#compiling the model
loss_func = BinaryCrossentropy()
opt = Adam(learning_rate=0.001)
model.compile(loss=loss_func,
optimizer=opt,
metrics=['accuracy'])
#Image augmentation
test_path = '...'
val_path = '...'
datagen = ImageDataGenerator(preprocessing_function=preprocess_input,
zoom_range=0.2,
brightness_range=[0.5,1.5],
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True)
batch_size=32
validation_size=32
train_set = datagen.flow_from_directory(test_path,
target_size = (image_size, image_size),
batch_size=batch_size,
class_mode = 'categorical')
validation_set = datagen.flow_from_directory(val_path,
target_size = (image_size, image_size),
batch_size=validation_size,
class_mode = 'categorical')
#Fitting the data into the model
model_history = model.fit(train_set,
validation_data=validation_set,
epochs=40,
steps_per_epoch=len(train_set)//batch_size,
validation_steps=len(validation_set)//validation_size,
verbose=1)
#testing the model on unseen data
test_path = '...'
test_datagen = ImageDataGenerator()
test_set = test_datagen.flow_from_directory(test_path,
target_size = (image_size, image_size),
class_mode = 'categorical')
predictions = model_testing.predict(test_set, verbose=1)
y_pred = np.argmax(predictions, axis=1)
class_labels = list(test_set.class_indices.keys())
print('Classification Report')
clsf = classification_report(test_set.classes, y_pred, target_names=class_labels)
print(clsf)
print('n')
print('Confusion Matrix')
cfm = confusion_matrix(test_set.classes, y_pred)
print(cfm)
首先进行一些观察。您有validation_set=datagen.flow_from_directory等。验证数据通常不会被增强,因此您不应该使用datagen,因为它会创建增强的图像。所以代码应该是
vgen=ImageDataGenerator(preprocessing_function=preprocess_input)
validation_set = vgen.flow_from_directory(val_path,
target_size = (image_size, image_size),
batch_size=validation_size,
class_mode = 'categorical')
我假设函数preprocess_input在-1到+1之间缩放像素,因为这就是MobileNet的训练对象。接下来,在测试生成器中,flow_from_directory中的testronget设置shuffle=False。现在要获得混淆矩阵的数据,你有这个代码
predictions = model_testing.predict(test_set, verbose=1)
什么是模型测试?您将模型编译为模型。所以应该是
predictions = model.predict(test_set, verbose=1)
为了获得混淆矩阵的数据,您需要为每个测试样本生成一个预测数组和一个真实标签数组。代码是显示在下方
labels=test_set.labels
y_pred=[]
predictions = model.predict(test_set, verbose=1)
for i, p in enumerate (predictions):
index=np.argmax(p)
y_pred.append(index)
y_true=np.array(labels)
y_pred=np.array(y_pred)
cm=confusion_matrix(y_true, y_pred)
好的,我们有一个混淆矩阵,现在你想把它画出来,这样你就可以理解结果了。
length=2 # two classes
plt.figure(figsize=(8, 8))
sns.heatmap(cm, annot=True, vmin=0, fmt='g', cmap='Blues', cbar=False)
classes=list(test_set.class_indices.keys())
plt.xticks(np.arange(length)+.5, classes, rotation= 90)
plt.yticks(np.arange(length)+.5, classes, rotation=0)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()
如果您还想要包含的分类报告
clr = classification_report(y_true, y_pred, target_names=classes)
print("Classification Report:n----------------------n", clr)