我正在尝试用默认的TFDS数据集之一训练用Keras Functional API制作的神经网络,但我不断收到与数据集相关的错误。
这个想法是做一个对象检测的模型,但在第一稿中,我试图只做普通图像分类(img,label(。输入将是(256x256x3(个图像。输入层如下:
img_inputs = keras.Input(shape=[256, 256, 3], name='image')
然后,我尝试使用TFDS中可用的voc2007数据集(一个非常旧和轻的版本,使其更快(
(train_ds, test_ds), ds_info = tfds.load(
'voc/2007',
split=['train', 'test'],
data_dir="/content/drive/My Drive",
with_info=True)
然后对数据进行如下预处理:
def resize_and_normalize_img(example):
"""Normalizes images: `uint8` -> `float32`."""
example['image'] = tf.image.resize(example['image'], [256, 256])
example['image'] = tf.cast(example['image'], tf.float32) / 255.
return example
def reduce_for_classification(example):
for key in ['image/filename', 'labels_no_difficult', 'objects']:
example.pop(key)
return example
train_ds_class = train_ds.map(reduce_for_classification, num_parallel_calls=tf.data.AUTOTUNE)
train_ds_class = train_ds_class.map(resize_and_normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
train_ds_class = train_ds_class.cache()
train_ds_class = train_ds_class.shuffle(ds_info.splits['train'].num_examples)
train_ds_class = train_ds_class.batch(64)
train_ds_class = train_ds_class.prefetch(tf.data.AUTOTUNE)
test_ds_class = test_ds.map(reduce_for_classification, num_parallel_calls=tf.data.AUTOTUNE)
test_ds_class = test_ds_class.map(resize_and_normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
test_ds_class = test_ds_class.batch(64)
test_ds_class = test_ds_class.cache()
test_ds_class = test_ds_class.prefetch(tf.data.AUTOTUNE)
然后拟合模型,如:
epochs=8
history = model.fit(
x=train_x, y =trian_y,
validation_data=test_ds_clas,
epochs=epochs
)
在这样做之后,当我得到一个错误,说我的模型期望一个形状为[None,256256,3]的输入,但它得到的是形状为[2256256,[3]的输入。
我认为这是一个与标签有关的问题。之前,我遇到了从tfds中获得的数据的字典式格式中的额外关键字的问题,并试图删除除标签之外的所有内容,但现在我仍然遇到了这个问题,不知道如何继续。我觉得在用tfds准备好数据集后,应该可以将其提供给模型了,在查看了文档、教程和堆栈溢出后,我还没有找到答案,我希望遇到这方面的人能提供帮助。
更新:为了提供更多信息,这是我使用的模型:
TLDR:图像输入256x256x3,一系列卷积和残差块,以及以平均池化、全连接层和softmax结束,从而产生(None,1280(张量。使用稀疏分类交叉熵作为损失,使用准确度作为度量。
img_inputs = keras.Input(shape=[256, 256, 3], name='image')
# first convolution
conv_first = tf.keras.layers.Conv2D(32, kernel_size=(3, 3), padding='same', name='first_conv')
x = conv_first(img_inputs)
# Second convolution
x = tf.keras.layers.Conv2D(64, kernel_size=(3, 3), strides=2, padding='same', name='second_conv')(x)
# First residual block
res = tf.keras.layers.Conv2D(32, kernel_size=(1, 1), name='res_block1_conv1')(x)
res = tf.keras.layers.Conv2D(64, kernel_size=(3, 3), padding='same', name='res_block1_conv2')(res)
x = x + res
# Convolution after First residual block
x = tf.keras.layers.Conv2D(128, kernel_size=3, strides=2, padding='same', name='first_post_res_conv')(x)
# Second residual Block
for i in range(2):
shortcut = x
res = tf.keras.layers.Conv2D(64, kernel_size=1, name=f'res_block2_conv1_loop{i}')(x)
res = tf.keras.layers.Conv2D(128, kernel_size=3, padding='same', name=f'res_block2_conv2_loop{i}')(res)
x = res + shortcut
# Convolution after Second residual block
x = tf.keras.layers.Conv2D(256, 3, strides=2, padding='same', name='second_post_res_conv')(x)
# Third residual Block
for i in range(8):
shortcut = x
res = tf.keras.layers.Conv2D(128, kernel_size=1, name=f'res_block3_conv1_loop{i}')(x)
res = tf.keras.layers.Conv2D(256, kernel_size=3, padding='same', name=f'res_block3_conv2_loop{i}')(res)
x = res + shortcut
# Convolution after Third residual block
x = tf.keras.layers.Conv2D(512, 3, strides=2, padding='same', name='third_post_res_conv')(x)
# Fourth residual Block
for i in range(8):
shortcut = x
res = tf.keras.layers.Conv2D(256, kernel_size=1, name=f'res_block4_conv1_loop{i}')(x)
res = tf.keras.layers.Conv2D(512, kernel_size=3, padding='same', name=f'res_block4_conv2_loop{i}')(res)
x = res + shortcut
# Convolution after Fourth residual block
x = tf.keras.layers.Conv2D(1024, 3, strides=2, padding='same', name='fourth_post_res_conv')(x)
# Fifth residual Block
for i in range(4):
shortcut = x
res = tf.keras.layers.Conv2D(512, kernel_size=1, name=f'res_block5_conv1_loop{i}')(x)
res = tf.keras.layers.Conv2D(1024, kernel_size=3, padding='same', name=f'res_block5_conv2_loop{i}')(res)
x = res + shortcut
# Global avg pooling
x = tf.keras.layers.GlobalAveragePooling2D(name='average_pooling')(x)
# Fully connected layer
x = tf.keras.layers.Dense(1280, name='fully_connected_layer')(x)
# Softmax
end_result = tf.keras.layers.Softmax(name='softmax')(x)
model = tf.keras.Model(inputs=img_inputs, outputs=end_result, name="darknet53")
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
在尝试了AloneTogether提出的解决方案后,我得到了以下错误(我多次尝试更改tf.one_hot((函数中的轴,结果相同(:
Node: 'sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits'
logits and labels must have the same first dimension, got logits shape [64,1280] and labels shape [1280]
[[{{node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]] [Op:__inference_train_function_20172]
这似乎与批处理有关,但不知道如何修复。
整个问题似乎真的与标签编码有关,因为当在没有tf.reduce_sum((函数的情况下运行那一行时,我得到了相同的结果,但有:
First element had shape [2,20] and element 1 had shape [1,20].
如果我在没有一条热编码线的情况下运行相同的程序,我会得到以下错误:
´´节点:"IteratorGetNext"无法批处理组件1中具有不同形状的张量。第一个元素具有形状[4],而元素1具有形状[1]。[[{node IteratorGetNext}}]][操作:__explorence_train_function_18534]´´
我认为问题是每个图像可以属于多个类,所以我建议使用一个热编码标签。然后它应该会起作用。这里有一个例子:
import tensorflow as tf
import tensorflow_datasets as tfds
def resize_and_normalize_img(example):
"""Normalizes images: `uint8` -> `float32`."""
example['image'] = tf.image.resize(example['image'], [256, 256])
example['image'] = tf.cast(example['image'], tf.float32) / 255.
return example['image'], example['labels']
def reduce_for_classification(example):
for key in ['image/filename', 'labels_no_difficult', 'objects']:
example.pop(key)
return example
(train_ds, test_ds), ds_info = tfds.load('voc/2007', split=['train', 'test'], with_info=True)
train_ds_class = train_ds.map(reduce_for_classification, num_parallel_calls=tf.data.AUTOTUNE)
train_ds_class = train_ds_class.map(resize_and_normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
train_ds_class = train_ds_class.map(lambda x, y: (x, tf.reduce_sum(tf.one_hot(y, 20, axis=-1), axis=0)))
train_ds_class = train_ds_class.cache()
train_ds_class = train_ds_class.shuffle(ds_info.splits['train'].num_examples)
train_ds_class = train_ds_class.batch(64)
train_ds_class = train_ds_class.prefetch(tf.data.AUTOTUNE)
inputs = tf.keras.layers.Input(shape=[256, 256, 3], name='image')
x = tf.keras.layers.Flatten()(inputs)
x = tf.keras.layers.Dense(50, activation='relu')(x)
outputs = tf.keras.layers.Dense(20, activation='sigmoid')(x)
model = tf.keras.Model(inputs, outputs)
model.compile(loss='binary_crossentropy', optimizer='adam')
model.fit(train_ds_class, epochs=5)
Epoch 1/5
40/40 [==============================] - 16s 124ms/step - loss: 3.0883
Epoch 2/5
40/40 [==============================] - 5s 115ms/step - loss: 0.9750
Epoch 3/5
40/40 [==============================] - 5s 115ms/step - loss: 0.4578
Epoch 4/5
40/40 [==============================] - 5s 115ms/step - loss: 0.6004
Epoch 5/5
40/40 [==============================] - 5s 115ms/step - loss: 0.3534
<keras.callbacks.History at 0x7f0e59513f50>