我已经试着在gpu中运行tensorflow好几天了,但一直没能完成。
我知道有几个问题有类似的问题,但我已经尝试了我发现的所有问题,但都没有成功,所以这就是我写这个问题的原因:
如何安装libcusolver.so.11
https://stackoverflow.com/a/67642774/15098668
我已经为英伟达GeForce RTX 3090:安装了驱动程序460.106.00和cuda 11.2
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.106.00 Driver Version: 460.106.00 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3090 On | 00000000:08:00.0 On | N/A |
| 33% 26C P8 22W / 350W | 282MiB / 24260MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1264 G /usr/lib/xorg/Xorg 59MiB |
| 0 N/A N/A 3349 G /usr/lib/xorg/Xorg 124MiB |
| 0 N/A N/A 3508 G /usr/bin/gnome-shell 77MiB |
| 0 N/A N/A 6384 G /usr/lib/firefox/firefox 4MiB |
+-----------------------------------------------------------------------------+
大棒:
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 1
GCC编译器:
gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
我还向添加了LD_LIRARY_PATH/bashrc
# Nvidia cuda toolkit
export PATH=/usr/local/cuda-11.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64${LD_LIBRARY_PATH+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda
我试过几个tensorflow和tensorflow gpu版本,从2.4到2.7,但每个版本都失败了:
2022-01-24 21:28:43.206834: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
或
2022-01-24 21:28:44.087779: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-01-24 21:28:44.087827: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-01-24 21:28:44.087858: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-01-24 21:28:44.087891: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2022-01-24 21:28:44.087921: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2022-01-24 21:28:44.087947: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2022-01-24 21:28:44.087975: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
提前谢谢,我不知道还能尝试什么。。。
确保遵循tensorflow软件兼容性:https://www.tensorflow.org/install/source#gpu
更多详细信息请点击此处:https://stackoverflow.com/a/50622526
我在使用时遇到了这个问题
- python==3.10
- tensorflow==2.8.0
- cuda==11.0
- cudnn==8.0
通过将python和tensorflow分别降级为3.6和2.4.0来解决此问题。从而满足tensorflow兼容性。
在尝试了很多事情之后,我创建了一个新的conda环境并安装了tensorflow gpu,因为我不在乎TF版本:
conda install tensorflow-gpu -c anaconda
它安装了以下所有软件包:
package | build
---------------------------|-----------------
_tflow_select-2.1.0 | gpu 2 KB anaconda
absl-py-0.10.0 | py38_0 170 KB anaconda
aiohttp-3.6.3 | py38h7b6447c_0 622 KB anaconda
astunparse-1.6.3 | py_0 17 KB anaconda
async-timeout-3.0.1 | py38_0 12 KB anaconda
attrs-20.2.0 | py_0 41 KB anaconda
blas-1.0 | mkl 6 KB anaconda
blinker-1.4 | py38_0 21 KB anaconda
brotlipy-0.7.0 |py38h7b6447c_1000 349 KB anaconda
c-ares-1.16.1 | h7b6447c_0 112 KB anaconda
ca-certificates-2020.10.14 | 0 128 KB anaconda
cachetools-4.1.1 | py_0 12 KB anaconda
certifi-2020.6.20 | py38_0 160 KB anaconda
cffi-1.14.0 | py38h2e261b9_0 228 KB anaconda
chardet-3.0.4 | py38_1003 170 KB anaconda
click-7.1.2 | py_0 67 KB anaconda
cryptography-3.1.1 | py38h1ba5d50_0 618 KB anaconda
cudatoolkit-10.1.243 | h6bb024c_0 513.2 MB anaconda
cudnn-7.6.5 | cuda10.1_0 250.6 MB anaconda
cupti-10.1.168 | 0 1.7 MB anaconda
gast-0.3.3 | py_0 14 KB anaconda
google-auth-1.22.1 | py_0 62 KB anaconda
google-auth-oauthlib-0.4.1 | py_2 21 KB anaconda
google-pasta-0.2.0 | py_0 44 KB anaconda
grpcio-1.31.0 | py38hf8bcb03_0 2.3 MB anaconda
h5py-2.10.0 | py38hd6299e0_1 1.1 MB anaconda
hdf5-1.10.6 | hb1b8bf9_0 4.8 MB anaconda
idna-2.10 | py_0 56 KB anaconda
importlib-metadata-2.0.0 | py_1 35 KB anaconda
intel-openmp-2020.2 | 254 947 KB anaconda
keras-preprocessing-1.1.0 | py_1 36 KB anaconda
libgfortran-ng-7.3.0 | hdf63c60_0 1.3 MB anaconda
libprotobuf-3.13.0.1 | hd408876_0 2.3 MB anaconda
markdown-3.3.2 | py38_0 123 KB anaconda
mkl-2019.4 | 243 204.1 MB anaconda
mkl-service-2.3.0 | py38he904b0f_0 68 KB anaconda
mkl_fft-1.2.0 | py38h23d657b_0 173 KB anaconda
mkl_random-1.1.0 | py38h962f231_0 398 KB anaconda
multidict-4.7.6 | py38h7b6447c_1 72 KB anaconda
numpy-1.19.1 | py38hbc911f0_0 20 KB anaconda
numpy-base-1.19.1 | py38hfa32c7d_0 5.3 MB anaconda
oauthlib-3.1.0 | py_0 88 KB anaconda
openssl-1.1.1h | h7b6447c_0 3.8 MB anaconda
opt_einsum-3.1.0 | py_0 54 KB anaconda
protobuf-3.13.0.1 | py38he6710b0_1 702 KB anaconda
pyasn1-0.4.8 | py_0 58 KB anaconda
pyasn1-modules-0.2.8 | py_0 67 KB anaconda
pycparser-2.20 | py_2 94 KB anaconda
pyjwt-1.7.1 | py38_0 32 KB anaconda
pyopenssl-19.1.0 | py_1 47 KB anaconda
pysocks-1.7.1 | py38_0 27 KB anaconda
requests-2.24.0 | py_0 54 KB anaconda
requests-oauthlib-1.3.0 | py_0 22 KB anaconda
rsa-4.6 | py_0 26 KB anaconda
scipy-1.5.2 | py38h0b6359f_0 18.7 MB anaconda
six-1.15.0 | py_0 13 KB anaconda
tensorboard-2.2.1 | pyh532a8cf_0 2.5 MB anaconda
tensorboard-plugin-wit-1.6.0| py_0 663 KB anaconda
tensorflow-2.2.0 |gpu_py38hb782248_0 4 KB anaconda
tensorflow-base-2.2.0 |gpu_py38h83e3d50_0 421.3 MB anaconda
tensorflow-estimator-2.2.0 | pyh208ff02_0 276 KB anaconda
tensorflow-gpu-2.2.0 | h0d30ee6_0 2 KB anaconda
termcolor-1.1.0 | py38_1 8 KB anaconda
urllib3-1.25.11 | py_0 93 KB anaconda
werkzeug-1.0.1 | py_0 243 KB anaconda
wrapt-1.12.1 | py38h7b6447c_1 50 KB anaconda
yarl-1.6.2 | py38h7b6447c_0 142 KB anaconda
zipp-3.3.1 | py_0 11 KB anaconda
------------------------------------------------------------
Total: 1.41 GB
包括cudatoolkit和cudnn。。。
之后,我不知道为什么,TF检测到英伟达卡:
2022-01-25 09:37:52.865587: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-01-25 09:37:52.902796: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-25 09:37:52.903487: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:08:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.69GiB deviceMemoryBandwidth: 871.81GiB/s
2022-01-25 09:37:52.903637: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-01-25 09:37:52.904633: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2022-01-25 09:37:52.905878: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2022-01-25 09:37:52.906023: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2022-01-25 09:37:52.907115: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2022-01-25 09:37:52.907719: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2022-01-25 09:37:52.910042: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-01-25 09:37:52.910137: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-25 09:37:52.911078: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-25 09:37:52.911707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
Num GPUs Available: 1
Prcess finished with exit code 0