我正在尝试部署NVIDIA -gpu-cloud- Image,标题为&;NVIDIA gpu优化图像用于深度学习,ML &;HPC !"这就是"市场解决方案"由NVIDIA提供。我使用A100 GPU的所有默认设置进行部署。
当我第一次ssh到VM时,它问"你想下载最新的NVIDIA驱动程序以便NGC完成安装吗?"我选择yes,但是安装失败。知道是怎么回事吗?下面是完整的输出:
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.8.0-1032-gcp x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
System information as of Sat Jul 24 03:48:54 UTC 2021
System load: 1.09 Processes: 213
Usage of /: 19.1% of 30.84GB Users logged in: 0
Memory usage: 0% IPv4 address for ens5: 10.240.0.37
Swap usage: 0%
50 updates can be applied immediately.
30 of these updates are standard security updates.
To see these additional updates run: apt list --upgradable
The list of available updates is more than a week old.
To check for new updates run: sudo apt update
The following GCP CLI version has been pre-installed. Begin using the GCP CLI by first configuring your credentials using 'gcloud init'
name: google-cloud-sdk
summary: Command-line interface for Google Cloud Platform products and
services
publisher: Cloud SDK (google-cloud-sdk*)
store-url: https://snapcraft.io/google-cloud-sdk
contact: https://cloud.google.com/sdk/docs/
license: unset
description: |
Command-line interface for Google Cloud Platform products and services
commands:
- google-cloud-sdk.anthoscli
- google-cloud-sdk.bq
- google-cloud-sdk.docker-credential-gcloud
- google-cloud-sdk.gcloud
- google-cloud-sdk.gsutil
- google-cloud-sdk.kubectl
snap-id: MJbt3BgxESyOON7gqKVEnA06NLRM3Dxd
tracking: latest/stable/ubuntu-20.04
refresh-date: today at 03:48 UTC
channels:
latest/stable: 349.0.0 2021-07-20 (190) 243MB classic
latest/candidate: ^
latest/beta: 349.0.0 2021-07-20 (190) 243MB classic
latest/edge: 349.0.0 2021-07-20 (190) 243MB classic
installed: 349.0.0 (190) 243MB classic
Welcome to the NVIDIA GPU Cloud image. This image provides an optimized
environment for running the deep learning and HPC containers from the
NVIDIA GPU Cloud Container Registry. Many NGC containers are freely
available. However, some NGC containers require that you log in with
a valid NGC API key in order to access them. This is indicated by a
"pull access denied for xyz ..." or "Get xyz: unauthorized: ..." error
message from the daemon.
Documentation on using this image and accessing the NVIDIA GPU Cloud
Container Registry can be found at
http://docs.nvidia.com/ngc/index.html
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
NVIDIA GPU Cloud (NGC) is an optimized software environment that requires the
latest NVIDIA drivers to operate. If you do not download the NVIDIA drivers at
this time, your instance will shut down. Would you like to download the latest
NVIDIA drivers so NGC can finish installing? (Y/n)
Y
Enabling persistence mode...
nvidia-persistenced-init/README
nvidia-persistenced-init/install.sh
nvidia-persistenced-init/systemd/nvidia-persistenced.service.template
nvidia-persistenced-init/sysv/nvidia-persistenced.template
nvidia-persistenced-init/upstart/nvidia-persistenced.conf.template
Checking for common requirements...
sed found in PATH? Yes
useradd found in PATH? Yes
userdel found in PATH? Yes
id found in PATH? Yes
Common installation/uninstallation supported
Removing previous sample System V script... done.
Creating sample System V script... done.
Removing previous sample systemd service file... done.
Creating sample systemd service file... done.
Removing previous sample Upstart service file... done.
Creating sample Upstart service file... done.
Checking for systemd requirements...
/usr/lib/systemd/system directory exists? Yes
systemctl found in PATH? Yes
systemd installation/uninstallation supported
Installation parameters:
User : nvidia-persistenced
Group : nvidia-persistenced
systemd service installation path : /usr/lib/systemd/system
Adding user 'nvidia-persistenced' to group 'nvidia-persistenced'... done.
Installing sample systemd service nvidia-persistenced.service... done.
Enabling nvidia-persistenced.service... done.
Starting nvidia-persistenced.service... failed.
Aborting.
Cleaning up... done.
Skipping NV Peer Memory installation for non HPC SDK AMI
尝试安装驱动程序作为启动脚本的一部分。
安装脚本位于/usr/bin/gcp-ngc-login.sh
.
运行yes | /usr/bin/gcp-ngc-login.sh true 1.2 2> /dev/null
作为启动脚本的一部分。
命令来自/etc/skel/.profile
。