GPU程序无法执行:cublas运行时错误



我正试图通过pytorch在支持CUDA的GeForce GTX 1070 gpu上训练网络。我不理解这个错误,也没有发现任何类似的问题。我不知道是库达的问题还是我代码中的问题。

Traceback (most recent call last):
File "main.py", line 497, in <module>
main()
File "main.py", line 167, in main
train(train_loader, model, criterion, optimizer, epoch, normalizer)
File "main.py", line 244, in train
output = model(*input_var)
File "C:ProgramDataAnaconda3libsite-packagestorchnnmodulesmodule.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "C:Users1546544DesktopMLmodel.py", line 147, in forward
atom_fea = conv_func(atom_fea, nbr_fea, nbr_fea_idx)
File "C:ProgramDataAnaconda3libsite-packagestorchnnmodulesmodule.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "C:Users1546544DesktopMLmodel.py", line 66, in forward
total_gated_fea = self.fc_full(total_nbr_fea)
File "C:ProgramDataAnaconda3libsite-packagestorchnnmodulesmodule.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "C:ProgramDataAnaconda3libsite-packagestorchnnmoduleslinear.py", line 55, in forward
return F.linear(input, self.weight, self.bias)
File "C:ProgramDataAnaconda3libsite-packagestorchnnfunctional.py", line 837, in linear
output = input.matmul(weight.t())
File "C:ProgramDataAnaconda3libsite-packagestorchautogradvariable.py", line 386, in matmul
return torch.matmul(self, other)
File "C:ProgramDataAnaconda3libsite-packagestorchfunctional.py", line 192, in matmul
output = torch.mm(tensor1, tensor2)
RuntimeError: cublas runtime error : the GPU program failed to execute at C:/Anaconda2/conda-bld/pytorch_1519496000060/work/torch/lib/THC/THCBlas.cu:247

我也遇到了同样的问题。

I通过数据集标签更正修复了此问题。我的意思是,训练标签对于我的数据集是不正确的。这就是它在backward()过程中失败的原因。

因此,在从磁盘/数据库加载后检查期望的标签可能会有所帮助。

相关内容

  • 没有找到相关文章

最新更新