RuntimeError Pytoch无法找到有效的cuDNN算法来运行卷积



我想为我的工作测试一个github:

https://github.com/tufts-ml/GAN-Ensemble-for-Anomaly-Detection

所以我做了一个

git clone https://github.com/tufts-ml/GAN-Ensemble-for-Anomaly-Detection

不幸的是,当我执行

命令时,我有一个错误
sh experiments/run_mnist_en_fanogan.sh

(from github README)

sh experiments/run_mnist_en_fanogan.sh                                                                                                                     1 ✘ 
/home/svetlana/.local/lib/python3.9/site-packages/torch/cuda/__init__.py:106: UserWarning: 
NVIDIA GeForce RTX 3080 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3080 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
/home/svetlana/.local/lib/python3.9/site-packages/torchvision/datasets/mnist.py:498:      UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:180.)
return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
Traceback (most recent call last):
File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/train.py", line 30, in <module>
main()
File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/train.py", line 24, in main
model.train()
File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/models/f_anogan.py", line 155, in train
self.gan_training(epoch)
File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/models/f_anogan.py", line 93, in gan_training
fake_imgs = self.net_Gds[i_G](z)
File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/models/networks.py", line 175, in forward
output = self.main(input)
File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 916, in forward
return F.conv_transpose2d(
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

我认为我的安装是好的,但现在我有疑问。这是我的安装:

Python 3.9.6 (default, Jun 30 2021, 10:22:16)
nvcc  --version                                                                                                                                           
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jul_14_19:41:19_PDT_2021
Cuda compilation tools, release 11.4, V11.4.100
Build cuda_11.4.r11.4/compiler.30188945_0

import torch
print(torch.__version__)
1.9.0+cu102

我安装了cudnn-11.4从nvidia网站(https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html),I不知道命令来检查版本,我试了这个:

cat /opt/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

但是它什么也没返回

我在这里找到了解决方案:未能获得卷积算法。这可能是因为cuDNN初始化失败,

没有成功(为了显示VRAM,我使用nvtop)

@Berriel

你说得对,我是在关注错误。

为了解决这个问题,我做了

pip uninstall torch torchvision torchaudio

然后

pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

根据

https://pytorch.org/get-started/locally/

(此链接来自警告消息)

相关内容

最新更新