多类数据集不平衡


from tensorflow.keras.preprocessing.image import ImageDataGenerator
import tensorflow as tf
train_path = 'Skin/Train'
test_path = 'Skin/Test'
train_gen = ImageDataGenerator(rescale=1./255)
train_generator = train_gen.flow_from_directory(train_path,target_size= 
(300,300),batch_size=30,class_mode='categorical')
model = tf.keras.models.Sequential([
# Note the input shape is the desired size of the image 300x300 with 3 bytes color
# This is the first convolution
tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(600, 450, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
# The second convolution
tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The third convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The fourth convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The fifth convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# Flatten the results to feed into a DNN
tf.keras.layers.Flatten(),
# 512 neuron hidden layer
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(9, activation='softmax')
])
from tensorflow.keras.optimizers import RMSprop
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(lr=0.001),
metrics=['acc'])
history = model.fit_generator(
train_generator,
steps_per_epoch=8,  
epochs=15,
verbose=2, class_weight = ? )

我在实现准确性方面遇到问题,我正在训练一个 9 类数据集,其中类 1、4 和 5 类只有 100、96、90 张图像,而其余类有 500 多张图像。因此,我无法实现更高的精度,因为权重偏向于数量更多的图像。我希望在训练期间所有班级都被视为相等,即 500。如果我可以通过张量流或任何 keras 函数代码对类进行上采样,将不胜感激。而不是手动对文件夹中的图像进行上采样或下采样。

您可以在 fit 方法中使用class_weight参数。 对于上采样,您需要大量的手动工作,这是不可避免的。

假设你有一个形状为(anything, 9)的输出,并且你知道每个类的总数:

totals = np.array([500,100,500,500,96,90,.......])
totalMean = totals.mean()
weights = {i: totalMean / count for i, count in enumerate(totals)}
model.fit(....., class_weight = weights)

最新更新