注意:我已经看到过类似的问题:同样的错误,告诉torch不要使用GPU,但答案对我不起作用。
我安装了PyTorch版本1.13.0+cu117
(最新版本),代码结构如下(一个图像分类任务):
# os.environ["CUDA_VISIBLE_DEVICES"]="" # required?
device = torch.device("cpu") # use CPU
...
train_set = DataLoader(
torchvision.datasets.ImageFolder(path, transform), **kwargs
)
...
model = myCNN().to(device)
optimizer = SGD(args)
loss = CrossEntropyLoss()
train()
我想在中央处理器上训练。
对于数据加载器,根据此,我设置了pin_memory=True
和non_blocking=pin_memory
。即使在设置pin_memory=False
时,错误仍然存在。
训练循环具有以下结构:
for epoch in n_epochs:
model.train()
inputs, labels = inputs.to(device, non_blocking=non_blocking), labels.to(device, non_blocking=non_blocking)
Compute loss, back-propagate
错误回溯(调用train()
时):
Traceback (most recent call last):
File "code.py", line 233, in <module>
train()
File "code.py", line 122, in train
outputs = model(inputs)
File "...torchnnmodulesmodule.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "code.py", line 87, in forward
output = self.network(input)
File "...torchnnmodulesmodule.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "...torchnnmodulescontainer.py", line 204, in forward
input = module(input)
File "...torchnnmodulesmodule.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "...torchnnmodulesconv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "...torchnnmodulesconv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
编辑:有关于该模型可能出现的问题的评论。模型大致为:
class myCNN(nn.Module):
def __init__(self, ...other args...):
super().__init__()
self.network = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding),
nn.ReLU(),
nn.MaxPool2d(kernel_size),
... similar convolutional layers ...
nn.Flatten(),
nn.Linear(in_features, out_features)
)
def forward(self, input):
output = self.network(input)
return output
由于我已将模型和数据传输到同一设备,导致此错误的原因可能是什么?如何纠正?
问题是由于torchinfo
中summary
的使用不正确。它执行前向传递(如果提供了输入大小),并且设备(默认情况下)是基于torch.cuda.is_available()
选择的。
如果device
(如问题中所述)参数被赋予summary
,则训练会很好地进行。