Pytorch:运行时错误:减少无法同步:cudaErrorAssert:设备端断言触发



尝试在此数据集上训练它时,我遇到了以下错误。

由于这是论文中发表的配置,我假设我正在做一些令人难以置信的错误。

每次我尝试运行训练时,此错误都会到达不同的图像。

C:/w/1/s/windows/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: block: [0,0,0], thread: [6,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
File "C:Program FilesJetBrainsPyCharm Community Edition 2019.1.1helperspydevpydevd.py", line 1741, in <module>
main()
File "C:Program FilesJetBrainsPyCharm Community Edition 2019.1.1helperspydevpydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "C:Program FilesJetBrainsPyCharm Community Edition 2019.1.1helperspydevpydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals)  # execute the script
File "C:Program FilesJetBrainsPyCharm Community Edition 2019.1.1helperspydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"n", file, 'exec'), glob, loc)
File "C:/Noam/Code/vision_course/hopenet/deep-head-pose/code/original_code_augmented/train_hopenet_with_validation_holdout.py", line 187, in <module>
loss_reg_yaw = reg_criterion(yaw_predicted, label_yaw_cont)
File "C:NoamCodevision_coursehopenetvenvlibsite-packagestorchnnmodulesmodule.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "C:NoamCodevision_coursehopenetvenvlibsite-packagestorchnnmodulesloss.py", line 431, in forward
return F.mse_loss(input, target, reduction=self.reduction)
File "C:NoamCodevision_coursehopenetvenvlibsite-packagestorchnnfunctional.py", line 2204, in mse_loss
ret = torch._C._nn.mse_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
RuntimeError: reduce failed to synchronize: cudaErrorAssert: device-side assert triggered

有什么想法吗?

这种错误通常在使用NLLLossCrossEntropyLoss时发生,并且数据集具有负标签(或标签大于类数(时。这也是您得到的断言失败t >= 0 && t < n_classes确切错误。

这不会发生在MSELoss,但OP提到某处有一个CrossEntropyLoss,因此发生了错误(程序在其他行上异步崩溃(。解决方案是清理数据集并确保满足t >= 0 && t < n_classes(其中t表示标签(。

此外,如果使用NLLLossBCELoss,请确保网络输出在 0 到 1 范围内(然后分别需要softmaxsigmoid激活(。请注意,这对于CrossEntropyLossBCEWithLogitsLoss不是必需的,因为它们在损失函数中实现了激活函数。(感谢@PouyaB指出(。

最新更新