在VS code容器中使用GPU



我想在Visual Studio Code docker容器内使用GPU来训练TensorFlow模型。要为我的容器构建映像,我使用下一个Dockerfile:

FROM mcr.microsoft.com/vscode/devcontainers/anaconda:0-3

ARG PROJECT_NAME=fire_rec
ARG NODE_VERSION="none"
RUN if [ "${NODE_VERSION}" != "none" ]; then su vscode -c "umask 0002 && . /usr/local/share/nvm/nvm.sh && nvm install ${NODE_VERSION} 2>&1"; fi

COPY environment.yml* .devcontainer/noop.txt /tmp/conda-tmp/
RUN if [ -f "/tmp/conda-tmp/environment.yml" ]; then umask 0002 && /opt/conda/bin/conda env update -n base -f /tmp/conda-tmp/environment.yml; fi 
&& rm -rf /tmp/conda-tmp

WORKDIR /srv/${PROJECT_NAME}
COPY requirements.txt /srv/${PROJECT_NAME}
RUN apt-get update && apt-get install -y python3-opencv
RUN apt-get update && apt-get install -y pip
RUN python3 -m pip install --no-cache -r requirements.txt
RUN apt-get update && apt-get install -y nvidia-cuda-toolkit

"requirements.txt"包括:

opencv-python
tensorflow-gpu
numpy
matplotlib
albumentations
tensorflow_addons

我也有。devcontainer.json文件:

{
"name": "Anaconda (Python 3)",
"build": { 
"context": "..",
"dockerfile": "Dockerfile",
"args": {
"NODE_VERSION": "none"
}
},
"settings": { 
"python.defaultInterpreterPath": "/opt/conda/bin/python",
"python.linting.enabled": true,
"python.linting.pylintEnabled": true,
"python.formatting.autopep8Path": "/opt/conda/bin/autopep8",
"python.formatting.yapfPath": "/opt/conda/bin/yapf",
"python.linting.flake8Path": "/opt/conda/bin/flake8",
"python.linting.pycodestylePath": "/opt/conda/bin/pycodestyle",
"python.linting.pydocstylePath": "/opt/conda/bin/pydocstyle",
"python.linting.pylintPath": "/opt/conda/bin/pylint"
},
"extensions": [
"ms-python.python",
"ms-python.vscode-pylance"
],
"remoteUser": "vscode",
}

我成功地构建了映像并启动了容器。但是当我尝试在容器内的jupyter-notebook中启动这段代码时:

import tensorflow as tf
tf.config.list_physical_devices('GPU')

我得到下一个消息:

2022-05-05 14:42:02.712454: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-05-05 14:42:02.712483: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist

所以这段代码无法使用GPU。我该如何解决这个问题?

确保安装了NVIDIA Container Toolkit。然后将其添加到.devcontainer.json中:

"runArgs": [
"--gpus",
"all"
]

检查这个,看看如何在你的.devcontainer.json中添加更多选项

请将您的代码更改为您的devcontainer.json

{
"name": "Anaconda (Python 3)",
"build": { 
"context": "..",
"dockerfile": "Dockerfile",
"args": {
"NODE_VERSION": "none"
}
},
"settings": { 
"python.defaultInterpreterPath": "/opt/conda/bin/python",
"python.linting.enabled": true,
"python.linting.pylintEnabled": true,
"python.formatting.autopep8Path": "/opt/conda/bin/autopep8",
"python.formatting.yapfPath": "/opt/conda/bin/yapf",
"python.linting.flake8Path": "/opt/conda/bin/flake8",
"python.linting.pycodestylePath": "/opt/conda/bin/pycodestyle",
"python.linting.pydocstylePath": "/opt/conda/bin/pydocstyle",
"python.linting.pylintPath": "/opt/conda/bin/pylint"
},
"extensions": [
"ms-python.python",
"ms-python.vscode-pylance"
],

"runArgs": ["--gpus","all"
],
"remoteUser": "vscode",
}

前提条件:

  1. 机器有GPU显卡,并且安装了GPU显卡驱动程序;

  2. GPU、CUDA等的安装环境;

  3. 在NVIDIA-SMI中打开PM属性;

  4. 程序中指定的GPU设备;

在终端运行python程序,并执行以下命令:CUDA_VISIBLE_DEVICES=0 python filename.py

遇到同样的问题,我尝试了许多选项来指定"gpus"在"runArgs"(通过ID, GPU的确切名称),没有工作。另一方面,当我手动运行容器时,一切都工作了。对我来说,它看起来像是vsc中的一些bug:/

以防万一,我在github上做了一个问题:https://github.com/microsoft/vscode-remote-release/issues/6989

最新更新