>我正在尝试创建一个三维卷积神经网络自动编码器。我无法将张量的输入维度与输出相匹配
我尝试更改图层形状并使用 Keras 自动编码器。
padding = 'SAME'
stride = [1,1,1]
self.inputs_ = tf.placeholder(tf.float32, input_shape, name='inputs')
self.targets_ = tf.placeholder(tf.float32, input_shape, name='targets')
conv1 = tf.layers.conv3d(inputs= self.inputs_, filters=16, kernel_size=(3,3,3), padding= padding, strides = stride, activation=tf.nn.relu)
maxpool1 = tf.layers.max_pooling3d(conv1, pool_size=(2,2,2), strides=(2,2,2), padding= padding)
conv2 = tf.layers.conv3d(inputs=maxpool1, filters=32, kernel_size=(3,3,3), padding= padding, strides = stride, activation=tf.nn.relu)
maxpool2 = tf.layers.max_pooling3d(conv2, pool_size=(3,3,3), strides=(3,3,3), padding= padding)
conv3 = tf.layers.conv3d(inputs=maxpool2, filters=96, kernel_size=(2,2,2), padding= padding , strides = stride, activation=tf.nn.relu)
maxpool3 = tf.layers.max_pooling3d(conv3, pool_size=(2,2,2), strides=(2,2,2), padding= padding)
#latent internal representation
#decoder
# tf.keras.layers.UpSampling3D()
unpool1 =K.resize_volumes(maxpool3,2,2,2,"channels_last")
deconv1 = tf.layers.conv3d_transpose(inputs=unpool1, filters=96, kernel_size=(2,2,2), padding= padding , strides = stride, activation=tf.nn.relu)
unpool2 = K.resize_volumes(deconv1,3,3,3,"channels_last")
deconv2 = tf.layers.conv3d_transpose(inputs=unpool2, filters=32, kernel_size=(3,3,3), padding= padding , strides = stride, activation=tf.nn.relu)
unpool3 = K.resize_volumes(deconv2,2,2,2,"channels_last")
deconv3 = tf.layers.conv3d_transpose(inputs=unpool3, filters=16, kernel_size=(3,3,3), padding= padding , strides = stride, activation=tf.nn.relu)
self.output = tf.layers.dense(inputs=deconv3, units=3)
self.output = tf.reshape(self.output, self.input_shape)
ValueError:无法将具有1850688元素的张量整形为"Reshape"(op:"Reshape"(的形状 [1,31,73,201,3](1364589 个元素(,输入形状为 [1,36,84,204,3], [5],输入张量计算为部分形状:input[1] = [1,31,73,201,3]。
您的输入形状[1, 31, 73, 201, 3]
。在转置卷积期间,您将在三个resize_volumes
层中执行[2,2,2]
、[3,3,3]
和[2,2,2]
的放大。如果将这些数字乘以轴,它将[12, 12, 12]
(每个数字为 2*3*2(。因此,解码器的输出在每个维度上将是 12 的倍数。
但输入维度形状的[x, 31, 73, 201, x]
不是 12 的倍数。大于这些维度的最接近的倍数为[x, 36, 84, 204, x]
。因此,解决方案要么在解码部分之后,您将去除多余的尺寸并将其与原始尺寸匹配,要么更好的解决方案是用零填充原始形状并使其成为 12 的倍数。遵循第二个解决方案后,您必须考虑输入的新维度。
更新的代码(仅更改部分(
self.inputs_ = tf.placeholder(tf.float32, input_shape, name='inputs')
pad_inputs = tf.pad(self.inputs_, [[0,0], [2, 3], [5, 6], [1, 2], [0, 0]]) # Pad at the edges
print(pad_inputs.shape) # [1, 36, 84, 204, 3]
conv1 = tf.layers.conv3d(inputs= pad_inputs, filters=16, kernel_size=(3,3,3), padding= padding, strides = stride, activation=tf.nn.relu)
最后,
self.output = tf.reshape(self.output, pad_inputs.shape)