我想为我的工作测试一个github:
https://github.com/tufts-ml/GAN-Ensemble-for-Anomaly-Detection
所以我做了一个
git clone https://github.com/tufts-ml/GAN-Ensemble-for-Anomaly-Detection
不幸的是,当我执行
命令时,我有一个错误sh experiments/run_mnist_en_fanogan.sh
(from github README)
sh experiments/run_mnist_en_fanogan.sh 1 ✘
/home/svetlana/.local/lib/python3.9/site-packages/torch/cuda/__init__.py:106: UserWarning:
NVIDIA GeForce RTX 3080 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3080 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
/home/svetlana/.local/lib/python3.9/site-packages/torchvision/datasets/mnist.py:498: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:180.)
return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
Traceback (most recent call last):
File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/train.py", line 30, in <module>
main()
File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/train.py", line 24, in main
model.train()
File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/models/f_anogan.py", line 155, in train
self.gan_training(epoch)
File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/models/f_anogan.py", line 93, in gan_training
fake_imgs = self.net_Gds[i_G](z)
File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/models/networks.py", line 175, in forward
output = self.main(input)
File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 916, in forward
return F.conv_transpose2d(
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
我认为我的安装是好的,但现在我有疑问。这是我的安装:
Python 3.9.6 (default, Jun 30 2021, 10:22:16)
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jul_14_19:41:19_PDT_2021
Cuda compilation tools, release 11.4, V11.4.100
Build cuda_11.4.r11.4/compiler.30188945_0
import torch
print(torch.__version__)
1.9.0+cu102
我安装了cudnn-11.4从nvidia网站(https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html),I不知道命令来检查版本,我试了这个:
cat /opt/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
但是它什么也没返回
我在这里找到了解决方案:未能获得卷积算法。这可能是因为cuDNN初始化失败,
没有成功(为了显示VRAM,我使用nvtop
)
@Berriel
你说得对,我是在关注错误。
为了解决这个问题,我做了
pip uninstall torch torchvision torchaudio
然后
pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
根据
https://pytorch.org/get-started/locally/
(此链接来自警告消息)