拟合和评估之间的自定义指标差异令人难以置信



我运行一个没有dropout(但BN)的张量流u-net模型,使用称为"平均准确性"的自定义指标。这实际上是代码部分。如您所见,数据集必须与我在fitevaluate之间不执行任何操作相同。

model.fit(x=train_ds, epochs=epochs, validation_data=val_ds, shuffle=True, 
callbacks=callbacks)
model.evaluate(train_ds)
model.evaluate(val_ds)

train_dsval_dstf.Dataset.这里是输出。

...
Epoch 10/10
148/148 [==============================] - 103s 698ms/step - loss: 0.1765 - accuracy: 0.5872 - average_accuracy: 0.9620 - val_loss: 0.5845 - val_accuracy: 0.5788 - val_average_accuracy: 0.5432
148/148 [==============================] - 22s 118ms/step - loss: 0.5056 - accuracy: 0.4540 - average_accuracy: 0.3654
29/29 [==============================] - 5s 122ms/step - loss: 0.5845 - accuracy: 0.5788 - average_accuracy: 0.5432

训练期间的average_accuracy(fit)和evaluateaverage_accuracy(均在训练数据集上)之间存在令人难以置信的差异。我知道BN可以产生这种效果,而且在训练期间表现也会发生变化,所以它们永远不会相等。但是从96%到36%?

我的自定义准确性在这里定义,但我怀疑这是我的个人实现,因为无论我做什么(我认为),它都应该以某种方式接近。

此处的任何提示都很有用。我不知道我是否应该查看自定义指标、数据集和模型。似乎在他们所有人之外。


我试图在停止后继续训练,average_accuracy从它离开的地方开始超过 90%。


自定义指标的上下文。我用它来进行语义分割。因此,每个图像都有一个标签图像作为WxHx4的输出(4是我的类总数)。

它分别计算平均精度,例如,每个类的精度,然后,如果它们是 4 个类,它确实求和(每个类的精度)/4。

这里是主代码:

def custom_average_accuracy(y_true, y_pred):
# Mask to remove the labels (y_true) that are zero: ex. [0, 0, 0]
remove_zeros_mask = tf.math.logical_not(tf.math.reduce_all(tf.math.logical_not(tf.cast(y_true, bool)), axis=-1))
y_true = tf.boolean_mask(y_true, remove_zeros_mask)
y_pred = tf.boolean_mask(y_pred, remove_zeros_mask)
num_cls = y_true.shape[-1]
y_pred = tf.math.argmax(y_pred, axis=-1)        # ex. [0, 0, 1] -> [2]
y_true = tf.math.argmax(y_true, axis=-1)
accuracies = tf.TensorArray(tf.float32, size=0, dynamic_size=True)
for i in range(0, num_cls):
cls_mask = y_true == i
cls_y_true = tf.boolean_mask(y_true, cls_mask)
if not tf.equal(tf.size(cls_y_true), 0):   # Some images don't have all the classes present.
new_acc = _accuracy(y_true=cls_y_true, y_pred=tf.boolean_mask(y_pred, cls_mask))
accuracies = accuracies.write(accuracies.size(), new_acc)
accuracies = accuracies.stack()
return tf.math.reduce_sum(accuracies) / tf.cast(len(accuracies), dtype=accuracies.dtype)

我相信问题可能出在if not tf.equal(tf.size(cls_y_true), 0)线上,但我似乎仍然不能。


更多信息。这正是我的代码行:

x_input, y_true = np.concatenate([x for x, y in ds], axis=0), np.concatenate([y for x, y in ds], axis=0)
model.evaluate(x=x_input, y=y_true)   # This gets 38% accuracy
model.evaluate(ds)                    # This gets 55% accuracy

这到底是怎么回事?这些代码行怎么能给出不同的结果呢?!?!


所以现在我有了,如果我不做ds = ds.shuffle(),例子(30 岁与 50 岁 ACC 值)就可以了。

我试图重现这种行为,但找不到您注意到的差异。我唯一改变的是not tf.equaltf.math.not_equal

import pathlib
import tensorflow as tf
dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
data_dir = pathlib.Path(data_dir)
num_classes = 5
batch_size = 32
img_height = 180
img_width = 180
val_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
train_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
def to_categorical(images, labels):
return images, tf.one_hot(labels, num_classes)
train_ds = train_ds.map(to_categorical)
val_ds = val_ds.map(to_categorical)
model = tf.keras.Sequential([
tf.keras.layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
tf.keras.layers.Conv2D(16, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Conv2D(64, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(num_classes)
])
def _accuracy(y_true, y_pred):
y_true.shape.assert_is_compatible_with(y_pred.shape)
if y_true.dtype != y_pred.dtype:
y_pred = tf.cast(y_pred, y_true.dtype)
reduced_sum = tf.reduce_sum(tf.cast(tf.math.equal(y_true, y_pred), tf.keras.backend.floatx()), axis=-1)
return tf.math.divide_no_nan(reduced_sum, tf.cast(tf.shape(y_pred)[-1], reduced_sum.dtype))
def custom_average_accuracy(y_true, y_pred):
# Mask to remove the labels (y_true) that are zero: ex. [0, 0, 0]
remove_zeros_mask = tf.math.logical_not(tf.math.reduce_all(tf.math.logical_not(tf.cast(y_true, bool)), axis=-1))
y_true = tf.boolean_mask(y_true, remove_zeros_mask)
y_pred = tf.boolean_mask(y_pred, remove_zeros_mask)
num_cls = y_true.shape[-1]
y_pred = tf.math.argmax(y_pred, axis=-1)        # ex. [0, 0, 1] -> [2]
y_true = tf.math.argmax(y_true, axis=-1)
accuracies = tf.TensorArray(tf.float32, size=0, dynamic_size=True)
for i in range(0, num_cls):
cls_mask = y_true == i
cls_y_true = tf.boolean_mask(y_true, cls_mask)
if tf.math.not_equal(tf.size(cls_y_true), 0):   # Some images don't have all the classes present.
new_acc = _accuracy(y_true=cls_y_true, y_pred=tf.boolean_mask(y_pred, cls_mask))
accuracies = accuracies.write(accuracies.size(), new_acc)
accuracies = accuracies.stack()
return tf.math.reduce_sum(accuracies) / tf.cast(len(accuracies), dtype=accuracies.dtype)
model.compile(optimizer='adam',
loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
metrics=['accuracy', custom_average_accuracy])
epochs=10
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs)
model.evaluate(train_ds)
model.evaluate(val_ds)
Found 3670 files belonging to 5 classes.
Using 734 files for validation.
Found 3670 files belonging to 5 classes.
Using 2936 files for training.
Epoch 1/10
92/92 [==============================] - 11s 95ms/step - loss: 1.6220 - accuracy: 0.2868 - custom_average_accuracy: 0.2824 - val_loss: 1.2868 - val_accuracy: 0.4946 - val_custom_average_accuracy: 0.4597
Epoch 2/10
92/92 [==============================] - 8s 85ms/step - loss: 1.2131 - accuracy: 0.4785 - custom_average_accuracy: 0.4495 - val_loss: 1.2051 - val_accuracy: 0.4673 - val_custom_average_accuracy: 0.4350
Epoch 3/10
92/92 [==============================] - 8s 84ms/step - loss: 1.0713 - accuracy: 0.5620 - custom_average_accuracy: 0.5404 - val_loss: 1.1070 - val_accuracy: 0.5232 - val_custom_average_accuracy: 0.5003
Epoch 4/10
92/92 [==============================] - 8s 83ms/step - loss: 0.9463 - accuracy: 0.6281 - custom_average_accuracy: 0.6203 - val_loss: 0.9880 - val_accuracy: 0.5967 - val_custom_average_accuracy: 0.5755
Epoch 5/10
92/92 [==============================] - 8s 84ms/step - loss: 0.8400 - accuracy: 0.6771 - custom_average_accuracy: 0.6730 - val_loss: 0.9420 - val_accuracy: 0.6308 - val_custom_average_accuracy: 0.6245
Epoch 6/10
92/92 [==============================] - 8s 83ms/step - loss: 0.7594 - accuracy: 0.7027 - custom_average_accuracy: 0.7004 - val_loss: 0.8972 - val_accuracy: 0.6431 - val_custom_average_accuracy: 0.6328
Epoch 7/10
92/92 [==============================] - 8s 82ms/step - loss: 0.6211 - accuracy: 0.7619 - custom_average_accuracy: 0.7563 - val_loss: 0.8999 - val_accuracy: 0.6431 - val_custom_average_accuracy: 0.6174
Epoch 8/10
92/92 [==============================] - 8s 82ms/step - loss: 0.5108 - accuracy: 0.8116 - custom_average_accuracy: 0.8046 - val_loss: 0.8809 - val_accuracy: 0.6689 - val_custom_average_accuracy: 0.6457
Epoch 9/10
92/92 [==============================] - 8s 83ms/step - loss: 0.3985 - accuracy: 0.8535 - custom_average_accuracy: 0.8534 - val_loss: 0.9364 - val_accuracy: 0.6676 - val_custom_average_accuracy: 0.6539
Epoch 10/10
92/92 [==============================] - 8s 83ms/step - loss: 0.3023 - accuracy: 0.8995 - custom_average_accuracy: 0.9010 - val_loss: 1.0118 - val_accuracy: 0.6662 - val_custom_average_accuracy: 0.6405
92/92 [==============================] - 6s 62ms/step - loss: 0.2038 - accuracy: 0.9363 - custom_average_accuracy: 0.9357
23/23 [==============================] - 2s 50ms/step - loss: 1.0118 - accuracy: 0.6662 - custom_average_accuracy: 0.663

嗯,我使用的是TensorFlow数据集。我改用了NumPy,现在一切似乎都是合乎逻辑的,并且有效。

不过,我需要知道 tf ds 不起作用的原因,但至少我不再有这些奇怪的结果。

尚未测试(我需要将代码恢复到原来的样子,也许有一天会这样做),但这可能是相关的。

最新更新