具有奇数或偶数宽度和高度的3D卷积自动编码器

我正在尝试使用自动编码器对时空数据进行编码。我的数据形状是：batches , filters, timesteps, rows, columns。其中，行=列

对于每个数据集，我对最后两个维度有不同的大小。例如，对于数据集1，行和列是5X5，对于日期集2，它们是4X4。

我在将自动编码器设置为正确的形状以用于不同的数据集时遇到了问题。

我在一个数据集上测试时发布了这个问题，该数据集在数据形状中有4行和4列3D卷积自动编码器未返回正确的输出形状

然而，当行和列是4以外的任何数字时，这种体系结构都不起作用。

对于编码序列，我希望代码保持时间步长尺寸的长度相同，并将高度和宽度减小到尺寸1
在这种情况下，如何提供一个3D卷积自动编码器，该编码器可以在不同的行和列输入形状下正常工作？

这是行和列为4:时的工作示例

input_imag = Input(shape=(11, 81, 4, 4))

x= input_imag
x = Conv3D(64, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = MaxPooling3D((1, 2, 2), data_format='channels_first', padding='same')(x)
x = Conv3D(32, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = MaxPooling3D((1, 2, 2), data_format='channels_first', padding='same')(x)
x = Conv3D(16, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
encoded = MaxPooling3D((1, 2, 2), data_format='channels_first', padding='same', name='encoder')(x)

x = Conv3D(16, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(encoded)
x = UpSampling3D((1, 1, 1), data_format='channels_first')(x)
x = Conv3D(32, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = UpSampling3D((1, 2, 2), data_format='channels_first')(x)
x = Conv3D(64, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = UpSampling3D((1, 2, 2), data_format='channels_first')(x)
decoded_out = Conv3D(3, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
autoencoder = Model(input_imag, decoded)
autoencoder.compile(optimizer='adam', loss='mse')

型号摘要：

Layer (type)                Output Shape              Param #
=================================================================
map_inputs (InputLayer)     [(None, 11, 81, 4, 4)]    0
conv3d (Conv3D)             (None, 64, 81, 4, 4)      31744
max_pooling3d (MaxPooling3D  (None, 64, 81, 2, 2)     0
)
conv3d_1 (Conv3D)           (None, 32, 81, 2, 2)      92192
max_pooling3d_1 (MaxPooling  (None, 32, 81, 1, 1)     0
3D)
conv3d_2 (Conv3D)           (None, 16, 81, 1, 1)      23056
encoder (MaxPooling3D)      (None, 16, 81, 1, 1)      0
conv3d_3 (Conv3D)           (None, 16, 81, 1, 1)      11536
up_sampling3d (UpSampling3D  (None, 16, 81, 1, 1)     0
)
conv3d_4 (Conv3D)           (None, 32, 81, 1, 1)      23072
up_sampling3d_1 (UpSampling  (None, 32, 81, 2, 2)     0
3D)
conv3d_5 (Conv3D)           (None, 64, 81, 2, 2)      92224
up_sampling3d_2 (UpSampling  (None, 64, 81, 4, 4)     0
3D)
conv3d_6 (Conv3D)           (None, 11, 81, 4, 4)      31691
=================================================================
Total params: 305,515
Trainable params: 305,515
Non-trainable params: 0
_________________________________________________________________

我们可以在Input中使用None进行动态大小调整，并最终调整为原始形状。原始编码器中的输出图像大小大约等于输入大小。我们只需要把它稍微转一下。

import tensorflow as tf
from tensorflow.python.keras import Model, Input
from tensorflow.python.keras.layers import UpSampling3D, MaxPooling3D, Conv3D

class MyModel(tf.keras.Model):
def __init__(self):
super().__init__()
input_imag = Input(shape=(11, 81, None, None))
x = input_imag
x = Conv3D(64, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = MaxPooling3D((1, 2, 2), data_format='channels_first', padding='same')(x)
x = Conv3D(32, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = MaxPooling3D((1, 2, 2), data_format='channels_first', padding='same')(x)
x = Conv3D(16, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
encoded = MaxPooling3D((1, 2, 2), data_format='channels_first', padding='same', name='encoder')(x)
x = Conv3D(16, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(encoded)
x = UpSampling3D((1, 1, 1), data_format='channels_first')(x)
x = Conv3D(32, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = UpSampling3D((1, 2, 2), data_format='channels_first')(x)
x = Conv3D(64, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
x = UpSampling3D((1, 2, 2), data_format='channels_first')(x)
decoded_out = Conv3D(3, (5, 3, 3), data_format='channels_first', activation='relu', padding='same')(x)
self._autoencoder = Model(input_imag, decoded_out)

def call(self, inputs, training=None):
input_shape = tf.shape(inputs)
output = self._autoencoder(inputs, training=training)
resized_output = self._resize(output=output, input_shape=input_shape)
return resized_output

def _resize(self, output, input_shape):
output_shape = tf.shape(output)
reshaped_output = tf.reshape(output, shape=[-1, output_shape[3], output_shape[4], 1])
resized_output = tf.image.resize(reshaped_output, [input_shape[3], input_shape[4]])
output = tf.reshape(resized_output, shape=[output_shape[0], output_shape[1], output_shape[2], input_shape[3], input_shape[4]])
return output


model = MyModel()
model.compile(optimizer='adam', loss='mse')
print(model(tf.zeros(shape=[2,11,81,4,4])).shape)   # (2, 3, 81, 4, 4)
print(model(tf.zeros(shape=[2,11,81,5,5])).shape)   # (2, 3, 81, 5, 5)
print(model(tf.zeros(shape=[2,11,81,42,42])).shape) # (2, 3, 81, 42, 42)

相关内容

最新更新

热门标签：