在NVIDIA+Cuda-Docker镜像构建期间未找到NVIDIA驱动程序



我正试图使用Nvidia cuda Base映像创建一个GPU微服务,但在docker构建过程中,我面临着未找到驱动程序的问题,有人能指出这里缺少什么吗?

DockerFile:

FROM nvidia/cuda:10.1-devel

# Install some basic utilities
RUN apt-get update && apt-get install -y 
curl 
ca-certificates 
sudo 
git 
bzip2 
libx11-6 
&& rm -rf /var/lib/apt/lists/*
ENV CONDA_AUTO_UPDATE_CONDA=false
ENV PATH=/home/user/miniconda/bin:$PATH
RUN curl -sLo ~/miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh 
&& chmod +x ~/miniconda.sh 
&& ~/miniconda.sh -b -p ~/miniconda 
&& rm ~/miniconda.sh 
&& conda install -y python==3.7 
&& conda clean -ya
ENV PATH="/usr/local/cuda-10.1/bin:$PATH"
ENV LD_LIBRARY_PATH="/usr/local/cuda-10.1/lib64:$LD_LIBRARY_PATH"
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
ENV NVIDIA_VISIBLE_DEVICES=all
ENV FORCE_CUDA="1"
RUN conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch
RUN pip install -v -e .

错误:

"/home/user/miniconda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1013, in _get_cuda_arch_flags
capability = torch.cuda.get_device_capability()
File "/home/user/miniconda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 320, in get_device_capability
prop = get_device_properties(device)
File "/home/user/miniconda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 325, in get_device_properties
_lazy_init()  # will define _get_device_properties and _CudaDeviceProperties
File "/home/user/miniconda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 196, in _lazy_init
_check_driver()
File "/home/user/miniconda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 101, in _check_driver
http://www.nvidia.com/Download/index.aspx""")
AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx

这些问题发生在docker文件中执行最后一步的过程中。

我尝试使用多个Nvidia基本docker图像,但没有太大帮助。(cuda:10.1-base-ubuntu18.04,cuda:10-1-runtime-ubuntu18.04(

感谢任何指点。

经过大量的尝试和错误,并通过了大量的文档,这就是行之有效的方法。

ARG PYTORCH=1.3
ARG CUDA=10.1
ARG CUDNN=7
FROM pytorch/pytorch:1.3-cuda10.1-cudnn7-devel
RUN mkdir /app
WORKDIR /app
ENV TORCH_CUDA_ARCH_LIST="5.2 6.0 6.1 7.0+PTX"
ENV TORCH_NVCC_FLAGS="-Xfatbin -compress-all"
ENV CMAKE_PREFIX_PATH="$(dirname $(which conda))/../"
RUN apt-get update && apt-get install -y libglib2.0-0 libsm6 libxrender-dev libxext6 
&& apt-get clean 
&& rm -rf /var/lib/apt/lists/*
# Install some basic utilities
RUN apt-get update && apt-get install -y 
curl 
ca-certificates 
sudo 
git 
bzip2 
libx11-6 
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update && 
apt-get install -y --no-install-recommends 
build-essential g++ 
libglib2.0-0 libsm6 libxrender-dev libxext6 wget
# Create a non-root user and switch to it
RUN adduser --disabled-password --gecos '' --shell /bin/bash user 
&& chown -R user:user /app
RUN echo "user ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/90-user
USER user
# All users can use /home/user as their home directory
ENV HOME=/home/user
RUN chmod 777 /home/user
# Install Miniconda and Python 3.7
ENV CONDA_AUTO_UPDATE_CONDA=false
ENV PATH=/home/user/miniconda/bin:$PATH
RUN curl -sLo ~/miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh 
&& chmod +x ~/miniconda.sh 
&& ~/miniconda.sh -b -p ~/miniconda 
&& rm ~/miniconda.sh 
&& conda install -y python==3.7 
&& conda clean -ya
RUN conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch
RUN pip install -v -e .

希望这能有所帮助!

祝你好运!

相关内容

最新更新