Pytorch将批大小识别为Conv2d层中的通道数

我是一个使用Pytorch创建VAE模型的神经网络新手。我使用一些tensorflow之前,但是我不知道"in_channels"one_answers";out_channels">

为nn.Conv2d/nn.Conv1d的参数。目前，我的模型采用批处理大小为128的数据加载器，其中每个输入是一个248 × 46张量(因此，一个128 × 248 × 46张量)。

我的编码器现在看起来像这样——我把它砍掉了，这样我就可以专注于错误来自哪里。

class Encoder(nn.Module):
def __init__(self, latent_dim):
super(Encoder, self).__init__()
self.latent_dim = latent_dim
self.conv1 = nn.Conv2d(in_channels=248, out_channels=46, kernel_size=(9, 9), stride=(5, 1), padding=(5, 4))
def forward(self, x):
print(x.size())
x = F.relu(self.conv1(x))
return x

Conv2d层旨在将248 × 46的输入减少为50 × 46的张量。然而，我得到这个错误:

RuntimeError: Given groups=1, weight of size [46, 248, 9, 9], expected input[1, 128, 248, 46] to have 248 channels, but got 128 channels instead

…即使我打印x.size()，它显示为[torch.Size([128, 248, 46])。

我不确定a)为什么错误显示该层正在向x添加额外的维度，b)我是否正确理解通道。实际的频道数应该是46吗?为什么Pytorch不简单地请求我的输入大小作为一个元组或其他东西，像in=(248, 46)?或者c)如果这是我将数据加载到模型的方式的问题。我有一个形状为(-1, 248, 46)的numpy数组data，然后开始训练我的模型，如下所示。

tensor_data = torch.from_numpy(data)
dataset = TensorDataset(tensor_data, tensor_data)
train_dl = DataLoader(dataset, batch_size=128, shuffle=True)
...
for epoch in range(20):
for x_train, y_train in train_loader:
x_train = x_train.to(device).float()
optimizer.zero_grad()
x_pred, mu, log_var = vae(x_train)
bce_loss = train.BCE(y_train, x_pred)
kl_loss = train.KL(mu, log_var)
loss = bce_loss + kl_loss
loss.backward()
optimizer.step()

感谢任何想法!

在pytorch中，nn.Conv2d假设输入(主要是图像数据)的形状如下:[B, C_in, H, W]，其中B是批处理大小，C_in是通道数，H和W是图像的高度和宽度。输出具有类似的形状[B, C_out, H_out, W_out]。这里，C_in和C_out分别是in_channels和out_channels。(H_out, W_out)是输出图像大小，它可能等于也可能不等于(H, W)，这取决于内核大小、步幅和填充。

然而，应用conv2d将[128, 248, 46]输入减少到[128, 50, 46]是令人困惑的。他们是图像数据248高度和宽度46 ?如果是这样，您可以将输入重塑为[128, 1, 248, 46]，并在conv2d中使用in_channels = 1和out_channels = 1。

假设模型需要一个单通道图像28 * 784(28这就变成了你in_channel out_channels是类的数量模型想预测

您需要在view函数中为通道数(1)添加额外的维度。下面的代码将工作!

class Encoder(nn.Module):
def __init__(self):
super(Encoder, self).__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=8, kernel_size=(9, 9), stride=(5, 1), padding=(5, 4))
def forward(self, x):
print("encoder input size: "+ str(x.shape))
# x.shape[0] is the number of samples in batches if the number of samples >1, otherwise it is the width
# (number of samples in a batch, number of channels, width, height)
x = x.view(x.shape[0], 1, 248,46)
print("encoder input size after adding 1 channel to shape: "+ str(x.shape))
x = F.relu(self.conv1(x))
return x
# a test dataset with 128 samples, 248 width and 46 height
test_dataset = torch.rand(128,248,46)
# prints shape of dataset
test.shape
model = Encoder()
model(test_dataset)
# if you are passing only one sample to the model (i.e. to plot) you need to do this instead
test_dataset2 = torch.rand(1,248,46)
model(test_dataset2.view(test_dataset2.shape[0],1,248,46))

相关内容

最新更新

热门标签：