在这个简单的CNN Tensorflow Keras图像分类模型中，我的测试数据的准确率不能超过50%

代码如下:我有一个非常不平衡的数据集关于心脏增大的胸部x光片。图像被分成一个训练文件夹，分为心脏肥大阳性和心脏肥大阴性子文件夹(467张pos图像和~20,000张阴性)。(然后我有一个测试文件夹，有两个子文件夹(300个pos, 300个negative)。每次测试时，使用下面的eval方法，我都能获得50%的准确率。当我查看预测时，总是发现它们都是一类(通常是负的)，但是如果我给正值一个非常高的权重(1000+与负值1相比)，模型就会翻转并说它们都是正的。这让我相信它是过拟合的，但我所有的尝试都解决了这个问题。

import pandas as pd
import os
import matplotlib.pyplot as plt
import numpy as np
import skimage as sk
import skimage.io as skio
import skimage.transform as sktr
import skimage.filters as skfl
import skimage.feature as skft
import skimage.color as skcol
import skimage.exposure as skexp
import skimage.morphology as skmr
import skimage.util as skut
import skimage.measure as skme
import sklearn.model_selection as le_ms
import sklearn.decomposition as le_de
import sklearn.discriminant_analysis as le_di
import sklearn.preprocessing as le_pr
import sklearn.linear_model as le_lm
import sklearn.metrics as le_me
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
classNames = ["trainpos","trainneg"]
testclassNames = ["testpos", "test"]
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
'./data/trainup/',
labels='inferred',
label_mode='categorical',
class_names=classNames,
color_mode='grayscale',
batch_size=32,
image_size=(256, 256),
shuffle=True,
seed=123,
validation_split=0.2,
subset="training",
interpolation='gaussian',
follow_links=False,
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
'./data/trainup/',
labels='inferred',
label_mode='categorical',
class_names=classNames,
color_mode='grayscale',
batch_size=32,
image_size=(256, 256),
shuffle=True,
seed=23,
validation_split=0.2,
subset="validation",
interpolation='gaussian',
follow_links=False,
)
test_ds = tf.keras.preprocessing.image_dataset_from_directory(
'./data/testup/',
labels='inferred',
label_mode='categorical',
class_names=testclassNames,
color_mode='grayscale',
batch_size=32,
image_size=(256, 256),
shuffle=True,
interpolation='gaussian',
follow_links=False,
)
AUTOTUNE = tf.data.experimental.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
model = tf.keras.Sequential([
tf.keras.layers.experimental.preprocessing.Rescaling(1./255, input_shape=(256, 256, 1)),
tf.keras.layers.Conv2D(16, 4, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(32, 4, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(2)
])
opt = keras.optimizers.Adam(learning_rate=0.0001)
model.compile(optimizer=opt,
loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
class_weight = {0: 29, 1: 1}
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=5,
class_weight=class_weight
)
test_loss, test_accuracy = model.evaluate(test_ds)
print("Test Loss: ", test_loss)
print("Test Accuracy: ", test_accuracy)

19/19 [==============================] - 7 s 376 ms/步骤——损失:3.4121 - 0.5000:准确性测试损耗:3.4121198654174805测试精度:0.5

我尝试将学习率更新到0.1和0.00001之间的值，添加epoch，删除epoch，更改为优化器的SGP，试图在下标后解包test_ds，它给了我错误，它是一个批处理数据集，不能下标。这表明test_ds为我提供了~19个张量，每个张量包含32张图像，除了最后一张大约有25张。然后我想单独预测这些图像并获得结果，因为它看起来像是将所有32个(或最后一个25个)组合在一起，然后基于此进行预测，但这让我陷入了困境，我没有得到结果。尝试了许多其他的事情，我不能完全记住通常调整模型本身或添加数据增强(我使用tensorflow 2.3，因为这是一个重复分配的类，所以数据增强不能用当前的文档完成(主要是垂直和水平的变化在这个版本从我可以告诉)

最好的办法是从一开始就消除这种不平衡。你有467张正面图片，这对模特来说已经足够了。所以从20000张图片中随机选择467张负面图片。这叫做欠采样，效果很好。另一种方法是同时使用欠采样和图像增强。下面显示了这样做的示例代码，我将negative类中的图像数量限制为1000，然后创建533个扩展图像并将它们添加到positive类目录中。注意:下面的代码将从负类目录中删除图像，并将增强图像添加到正类目录中，因此在运行代码之前，您可能希望创建这两个目录的备份，以便可以恢复原始数据。在演示代码中，我在正面目录中有1263个图像，在正面类目录中有467个图像。我测试了代码，它按预期工作。现在，如果您在Kagle上运行笔记本，下面的代码将无法工作，因为您无法更改输入目录中的数据。因此，在这种情况下，您必须首先将输入目录复制到kagle工作目录。然后设置这些目录的路径。

!pip install -U albumentations
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import os
import numpy as np
import random
import cv2
import albumentations as A
from tqdm import tqdm
def get_augmented_image(image): # this function returns an augmented version of the input img
# see albumentations documentation at URL https://albumentations.ai/docs/getting_started/image_augmentation/
# for information on various type of augmentations available these are examples below
width=int(image.shape[1]*.8)
height=int(image.shape[0]*.8)
transform= A.Compose([
A.HorizontalFlip(p=.5),
A.RandomBrightnessContrast(p=.5),
A.RandomGamma(p=.5),
A.RandomCrop(width=width, height=height, p=.25) ])    
return transform(image=image)['image']
negative_limit=1000
negative_dir_path=r'C:Tempdatatrainupnegative'# path to directory holding the negative images
positive_dir_path=r'C:Tempdatatrainuppositive' # path to directory holding positive images
negative_file_list=os.listdir(negative_dir_path)
positive_file_list=os.listdir(positive_dir_path)
sampled_negative_file_list=np.random.choice(negative_file_list, size=negative_limit, replace=False) 
for f in tqdm(negative_file_list, ncols=120, unit='files', colour='blue', desc='deleting excess neg files'): # this for loop leaves only 1000 images in the negative_image_directory
if f not in sampled_negative_file_list:
fpath=os.path.join(negative_dir_path,f)        
os.remove(fpath)
# now create augmented images
delta=negative_limit-len(os.listdir(positive_dir_path)) # this is the number of augmented images to create to balance the dataset
sampled_positive_image_list=np.random.choice(positive_file_list, delta, replace=True) # replace=True because delta>number of positive images
i=0
for  f in tqdm(sampled_positive_image_list, ncols=120, unit='files', colour='blue',desc='creating augment images'): # this loop creates augmented images and stores them in the positive image directory
fpath=os.path.join(positive_dir_path,f)
img=cv2.imread(fpath)
dest_file_name='aug' +str(i) + '-' + f # create the filename with a unique numeric prefix
dest_path=os.path.join(positive_dir_path, dest_file_name) # store augmented images witha numeric prefix in the filename
augmented_image=get_augmented_image(img)
cv2.imwrite(dest_path, augmented_image)
i +=1
# when these loops are done, the negative_image_directory will have 1000 images
# and the positive_image_directory will also have 1000 images, 533 of which are augmented images````

在你的代码中有

tf.keras.layers.Dense(2)

改变

tf.keras.layers.Dense(2, activation='softmax')

In model. complete remove (from_logits=True)

相关内容

最新更新

热门标签：