我是机器学习的新手,我正在研究一个包含14k张海洋、森林、冰川、街道、建筑和山脉图片的数据集(6类)。我一直在用它来训练我的模型,并达到了91%的准确率,但由于某种原因,它是有偏见的,当我试图用我的推理代码预测新图像时,选择的唯一类别是冰川和海洋。下面是包含模型创建代码和推理代码的Github。
train_datagen = ImageDataGenerator(
rotation_range= 20, # Rotate the augmented image by 20 degrees
zoom_range=0.3, # Zoom by 20% more or less
horizontal_flip=True, # Allow for horizontal flips of augmented images
vertical_flip=True, # Allow for vertical flips of augmented images
brightness_range=[0.6, 1.2], # Lighter and darker images
fill_mode='nearest',
preprocessing_function=preprocess_input)
img_data_iterator = train_datagen.flow_from_directory(
# Where to take the data from, the classes are the sub folder names
'../Q2B/archive/seg_train/seg_train/',
class_mode="categorical", # classes are in 2D one hot encoded way
shuffle=True, # shuffle the data, default is true but just to point it out
batch_size=32,
target_size=(150, 150), # This size is the default of mobilenet NN)
validation_generator = ImageDataGenerator(
preprocessing_function=preprocess_input).flow_from_directory(
'../Q2B/archive/seg_test/seg_test/',
class_mode="categorical",
shuffle=True,
batch_size=32,
target_size=(150, 150),)
我猜这与我预处理数据的方式有关。
您可以发布更多的代码吗?将训练和测试生成器的class_mode更改为'categorical'将最后的密集层从1改为2,这样将返回两个类的分数/概率。因此,当您使用argmax时,它将返回最高分数的索引位置,指示它预测了哪个类。