尝试运行一些Pytorch代码时,出现此错误:
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=74 error=38 : no CUDA-capable device is detected
Traceback (most recent call last):
File "demo.py", line 173, in test
pca = torch.FloatTensor( np.load('../basics/U_lrw1.npy')[:,:6]).cuda()
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:74
我正在使用"谷歌深度学习虚拟机"运行云虚拟机版本: tf-gpu.1-13.m25基于: Debian GNU/Linux 9.9 (stretch( (GNU/Linux 4.9.0-9-amd64 x86_64(Linux tf-gpu-interruptible 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1 (2019-04-12( x86_64
环境信息:
$ nvidia-smi
Sun May 26 05:32:33 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.72 Driver Version: 410.72 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 42C P0 74W / 149W | 0MiB / 11441MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
$ echo $CUDA_PATH
$ echo $LD_LIBRARY_PATH
/usr/local/cuda/lib64:/usr/local/nccl2/lib:/usr/local/cuda/extras/CUPTI/lib64
$ env | grep CUDA
CUDA_VISIBLE_DEVICES=0
$ pip freeze
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.
7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
audioread==2.1.7
backports.functools-lru-cache==1.5
certifi==2019.3.9
chardet==3.0.4
cloudpickle==1.1.1
cycler==0.10.0
dask==1.2.2
decorator==4.4.0
dlib==19.17.0
enum34==1.1.6
filelock==3.0.12
funcsigs==1.0.2
future==0.17.1
gdown==3.8.1
idna==2.8
joblib==0.13.2
kiwisolver==1.1.0
librosa==0.6.3
llvmlite==0.28.0
我没有明白你问题的主要原因。但我注意到一件事,GPU-Util 100%,而后面没有进程运行。
您可以尝试以下方向。
- Sudo nvidia-SMI -PM 1
在持久性模式下启用。这可能会解决您的问题。ECC 与非持久性模式的结合可以导致 GPU 的 100% 利用率。
-
您也可以使用命令 nvidia -smi -e 0 禁用 ECC
-
或者最好是从开始再次重新启动整个过程,即再次重新启动操作系统。
注意:我不确定它是否适合您。我之前也遇到过类似的问题,所以我只是根据我的经验告诉。希望这对您有所帮助。