Tensorflow:如何从tfrecord文件读取图像后设置张量形状进行数据增强?

我有一个tf.data.Dataset，我从tfrecords文件中读取如下:

import tensorflow as tf
# given an existing record_file
raw_dataset = tf.data.TFRecordDataset(record_file)
example_description = {
"height": tf.io.FixedLenFeature([], tf.int64),
"width": tf.io.FixedLenFeature([], tf.int64),
"channels": tf.io.FixedLenFeature([], tf.int64),
"image": tf.io.FixedLenFeature([], tf.string),
}
dataset = raw_dataset.map(
lambda example: tf.io.parse_single_example(example, example_description)
)

接下来，我将这些特征组合成一张图像，如下所示:

dataset = dataset.map(_extract_image_from_sample)
# and
def _extract_image_from_sample(sample):
height = tf.cast(sample["height"], tf.int32) # always 1038
width = tf.cast(sample["width"], tf.int32) # always 1366
depth = tf.cast(sample["channels"], tf.int32) # always 3
shape = [height, width, depth]
image = sample["image"]
image = decode_tf_image(image)
image = tf.reshape(image, shape)
return image

此时，数据集中的任何图像都具有形状(None, None, None)(这让我感到惊讶，因为我重塑了它们)。当我尝试使用tf.keras.preprocessing.image.ImageDataGenerator:

扩展数据集时，我认为这是错误的原因。

augmented_dataset = dataset.map(random_image_augmentation)
# and
image_data_generator = tf.keras.preprocessing.image.ImageDataGenerator(
rotation_range=45,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=5.0,
zoom_range=[0.9, 1.2],
fill_mode="reflect",
horizontal_flip=True,
vertical_flip=True,
)
def random_image_augmentation(image: tf.Tensor) -> tf.Tensor:
transform = image_data_generator.get_random_transform(img_shape=image.shape)
image = image_data_generator.apply_transform(image, transform)
return image

这会导致错误消息:

TypeError: in user code:
# ...
C:Users[PATH_TO_ENVIRONMENT]libsite-packageskeras_preprocessingimageimage_data_generator.py:778 get_random_transform  *
tx *= img_shape[img_row_axis]
TypeError: unsupported operand type(s) for *=: 'float' and 'NoneType'

但是，如果我不使用图形模式，而是使用急切模式，这就像一个魅力:

it = iter(dataset)
for i in range(3):
image = it.next()
image = random_image_augmentation(image.numpy())

这使我得出结论，主要错误是在数据集中读取后缺少形状信息。但我不知道如何更明确地定义它。什么好主意吗?

使用tf.py_function封装要求张量具有如下形状的预处理函数:

augmented_dataset = dataset.map(
lambda x: tf.py_function(random_image_augmentation, inp=[x], Tout=tf.float32),
num_parallel_calls=tf.data.experimental.AUTOTUNE
)
# and
def random_image_augmentation(image: tf.Tensor) -> tf.Tensor:
image = image.numpy()  # now we can do this, because tensors have this function in eager mode
transform = image_data_generator.get_random_transform(img_shape=image.shape)
image = image_data_generator.apply_transform(image, transform)
return image

这对我来说很有效，但我不确定这是否是唯一的甚至是最好的解决方案。

相关内容

最新更新

热门标签：