PyTorch AutoEncoder-解码的输出维度与输入维度不同



我正在构建一个自定义自动编码器,以便在数据集上进行训练。我的型号如下

class AutoEncoder(nn.Module):
def __init__(self):
super(AutoEncoder,self).__init__()
self.encoder = nn.Sequential(
nn.Conv2d(in_channels = 3, out_channels = 32, kernel_size=3,stride=1),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels = 32, out_channels = 64, kernel_size=3,stride=1),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels = 64, out_channels = 128, kernel_size=3,stride=1),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels=128,out_channels=256,kernel_size=5,stride=2),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels=256,out_channels=512,kernel_size=5,stride=2),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels=512,out_channels=1024,kernel_size=5,stride=2),
nn.ReLU(inplace=True)
)
self.decoder = nn.Sequential(
nn.ConvTranspose2d(in_channels=1024,out_channels=512,kernel_size=5,stride=2),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(in_channels=512,out_channels=256,kernel_size=5,stride=2),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(in_channels=256,out_channels=128,kernel_size=5,stride=2),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(in_channels=128,out_channels=64,kernel_size=3,stride=1),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(in_channels=64,out_channels=32,kernel_size=3,stride=1),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(in_channels=32,out_channels=3,kernel_size=3,stride=1),
nn.ReLU(inplace=True)
)

def forward(self,x):
x = self.encoder(x)
print(x.shape)
x = self.decoder(x)
return x

def unit_test():
num_minibatch = 16
img = torch.randn(num_minibatch, 3, 512, 640).cuda(0)
model = AutoEncoder().cuda()
model = nn.DataParallel(model)
output = model(img)
print(output.shape)
if __name__ == '__main__':
unit_test()

正如你所看到的,我的输入维度是(3512640(,但我通过解码器后的输出是(3507635(。我是不是在添加Conv2D转座层时遗漏了什么?

如有任何帮助,我们将不胜感激。感谢

不匹配是由ConvTranspose2d层的不同输出形状引起的。可以将1的output_padding添加到第一和第三转置卷积层来解决这个问题。

nn.ConvTranspose2d(in_channels=1024,out_channels=512,kernel_size=5,stride=2, output_padding=1)nn.ConvTranspose2d(in_channels=256,out_channels=128,kernel_size=5,stride=2, output_padding=1)

根据文件:

当步长>1时,Conv2d将多个输入形状映射到同一输出形状。提供CCD_ 6以通过在一侧有效地增加所计算的输出形状来解决这种模糊性。


添加output_padding:之前解码器层的形状

----------------------------------------------------------------
Layer (type)               Output Shape         Param #
================================================================
ConvTranspose2d-1        [-1, 512, 123, 155]      13,107,712
ReLU-2        [-1, 512, 123, 155]               0
ConvTranspose2d-3        [-1, 256, 249, 313]       3,277,056
ReLU-4        [-1, 256, 249, 313]               0
ConvTranspose2d-5        [-1, 128, 501, 629]         819,328
ReLU-6        [-1, 128, 501, 629]               0
ConvTranspose2d-7         [-1, 64, 503, 631]          73,792
ReLU-8         [-1, 64, 503, 631]               0
ConvTranspose2d-9         [-1, 32, 505, 633]          18,464
ReLU-10         [-1, 32, 505, 633]               0
ConvTranspose2d-11          [-1, 3, 507, 635]             867
ReLU-12          [-1, 3, 507, 635]               0

添加填充后:

================================================================
ConvTranspose2d-1        [-1, 512, 124, 156]      13,107,712
ReLU-2        [-1, 512, 124, 156]               0
ConvTranspose2d-3        [-1, 256, 251, 315]       3,277,056
ReLU-4        [-1, 256, 251, 315]               0
ConvTranspose2d-5        [-1, 128, 506, 634]         819,328
ReLU-6        [-1, 128, 506, 634]               0
ConvTranspose2d-7         [-1, 64, 508, 636]          73,792
ReLU-8         [-1, 64, 508, 636]               0
ConvTranspose2d-9         [-1, 32, 510, 638]          18,464
ReLU-10         [-1, 32, 510, 638]               0
ConvTranspose2d-11          [-1, 3, 512, 640]             867
ReLU-12          [-1, 3, 512, 640]               0

相关内容

  • 没有找到相关文章

最新更新