Tensorflow不能打开libcudnn



我试图在Tensorflow 2.4.1中获得GPU训练。我用的是Ubuntu 20.04,安装了Nvidia驱动460.32.03。我已经安装了CUDA工具箱11.2和cudn8。当启动tensorflow时,这是我看到的:

2021-01-21 16:23:31.457304: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-01-21 16:23:33.535844: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-21 16:23:33.536650: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-01-21 16:23:33.566101: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:21:00.0 name: Quadro RTX 4000 computeCapability: 7.5
coreClock: 1.545GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 387.49GiB/s
2021-01-21 16:23:33.566157: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-01-21 16:23:33.571082: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-01-21 16:23:33.571162: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-01-21 16:23:33.588669: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-01-21 16:23:33.590407: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-01-21 16:23:33.592191: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-01-21 16:23:33.592668: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-01-21 16:23:33.592781: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2021-01-21 16:23:33.592790: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

nvidia-smi看起来不错:

Thu Jan 21 16:31:51 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 4000     Off  | 00000000:21:00.0  On |                  N/A |
| 30%   36C    P8    11W / 125W |    570MiB /  7979MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
         
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3089      G   /usr/lib/xorg/Xorg                 71MiB |
|    0   N/A  N/A      4021      G   /usr/lib/xorg/Xorg                216MiB |
|    0   N/A  N/A      4153      G   /usr/bin/gnome-shell              106MiB |
|    0   N/A  N/A      4641      G   ...gAAAAAAAAA --shared-files       29MiB |
|    0   N/A  N/A      4827      G   /usr/lib/rstudio/bin/rstudio      132MiB |
+-----------------------------------------------------------------------------+

我已经验证了libcudnn.so.8存在于与其他CUDA库相同的文件夹中:

/usr/local/cuda-11.2/lib64$ ls -la libcud*
-rw-r--r-- 1 root root     845076 Jan 21 15:47 libcudadevrt.a
lrwxrwxrwx 1 root root         17 Jan 21 15:47 libcudart.so -> libcudart.so.11.0
lrwxrwxrwx 1 root root         20 Jan 21 15:47 libcudart.so.11.0 -> libcudart.so.11.2.72
-rwxr-xr-x 1 root root     582008 Jan 21 15:47 libcudart.so.11.2.72
-rw-r--r-- 1 root root     906670 Jan 21 15:47 libcudart_static.a
lrwxrwxrwx 1 root root         23 Jan 21 16:19 libcudnn_adv_infer.so -> libcudnn_adv_infer.so.8
lrwxrwxrwx 1 root root         27 Jan 21 16:19 libcudnn_adv_infer.so.8 -> libcudnn_adv_infer.so.8.0.5
-rwxr-xr-x 1 root root  144525080 Jan 21 16:19 libcudnn_adv_infer.so.8.0.5
lrwxrwxrwx 1 root root         23 Jan 21 16:19 libcudnn_adv_train.so -> libcudnn_adv_train.so.8
lrwxrwxrwx 1 root root         27 Jan 21 16:19 libcudnn_adv_train.so.8 -> libcudnn_adv_train.so.8.0.5
-rwxr-xr-x 1 root root   94896760 Jan 21 16:19 libcudnn_adv_train.so.8.0.5
lrwxrwxrwx 1 root root         23 Jan 21 16:19 libcudnn_cnn_infer.so -> libcudnn_cnn_infer.so.8
lrwxrwxrwx 1 root root         27 Jan 21 16:19 libcudnn_cnn_infer.so.8 -> libcudnn_cnn_infer.so.8.0.5
-rwxr-xr-x 1 root root 1438587968 Jan 21 16:19 libcudnn_cnn_infer.so.8.0.5
lrwxrwxrwx 1 root root         23 Jan 21 16:19 libcudnn_cnn_train.so -> libcudnn_cnn_train.so.8
lrwxrwxrwx 1 root root         27 Jan 21 16:19 libcudnn_cnn_train.so.8 -> libcudnn_cnn_train.so.8.0.5
-rwxr-xr-x 1 root root   89274264 Jan 21 16:19 libcudnn_cnn_train.so.8.0.5
lrwxrwxrwx 1 root root         23 Jan 21 16:19 libcudnn_ops_infer.so -> libcudnn_ops_infer.so.8
lrwxrwxrwx 1 root root         27 Jan 21 16:19 libcudnn_ops_infer.so.8 -> libcudnn_ops_infer.so.8.0.5
-rwxr-xr-x 1 root root  333101688 Jan 21 16:19 libcudnn_ops_infer.so.8.0.5
lrwxrwxrwx 1 root root         23 Jan 21 16:19 libcudnn_ops_train.so -> libcudnn_ops_train.so.8
lrwxrwxrwx 1 root root         27 Jan 21 16:19 libcudnn_ops_train.so.8 -> libcudnn_ops_train.so.8.0.5
-rwxr-xr-x 1 root root   37388984 Jan 21 16:19 libcudnn_ops_train.so.8.0.5
lrwxrwxrwx 1 root root         13 Jan 21 16:19 libcudnn.so -> libcudnn.so.8
lrwxrwxrwx 1 root root         17 Jan 21 16:19 libcudnn.so.8 -> libcudnn.so.8.0.5
-rwxr-xr-x 1 root root     158264 Jan 21 16:19 libcudnn.so.8.0.5
-rw-r--r-- 1 root root 2428480120 Jan 21 16:19 libcudnn_static.a

并且库看起来加载正常并且没有丢失任何依赖项:

$ ldd libcudnn.so.8
linux-vdso.so.1 (0x00007ffe41739000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f652d78a000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f652d767000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f652d761000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f652d580000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f652d565000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f652d371000)
/lib64/ld-linux-x86-64.so.2 (0x00007f652d9db000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f652d222000)

我还能错过什么?

我有同样的问题,经过一些尝试/失败,我找到了我的修复。修复方法是通过执行以下命令将该路径添加到path变量中:

$ export PATH=/usr/local/cuda-11.4/bin${PATH:+:${PATH}}
$ export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64
${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

这是CUDA设置的9.1.1节(https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#environment-setup)

相关内容

  • 没有找到相关文章

最新更新