当从不同的主机运行时,docker映像是不同的



在docker容器中构建第三方库(libtorch,如果重要的话(的过程中,我遇到了一个丢失包含文件的错误。当从Ubuntu 16.04主机运行构建过程时,同样的构建过程也很好,但当从Ubuntu 18.04主机运行时,文件丢失了。

经过一点追溯,我现在只是从NVidia运行基本容器,并查找文件。这是我得到的输出:

Ubuntu 16.04 host:

$ uname -a
Linux ub-carmel 4.15.0-123-generic #126~16.04.1-Ubuntu SMP Wed Oct 21 13:48:05 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ docker --version
Docker version 19.03.13, build 4484c46d9d
$ docker pull  nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
11.1-cudnn8-devel-ubuntu18.04: Pulling from nvidia/cuda
Digest: sha256:c5bf5c984998cc18a3f3a741c2bd7187ed860dc6d993b6fb402d0effb9fe6579
Status: Image is up to date for nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
$ docker run -it  nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
root@2ecc17248fab:/# ll /usr/lib/gcc/x86_64-linux-gnu/7/include | grep ia32
-rw-r--r-- 1 root root   7817 Dec  4  2019 ia32intrin.h

Ubuntu 18.04 host:

$ uname -a
Linux ub-carmel-18-04 5.4.0-56-generic #62~18.04.1-Ubuntu SMP Tue Nov 24 10:07:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ docker --version
Docker version 19.03.14, build 5eb3275d40
$ docker pull  nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
11.1-cudnn8-devel-ubuntu18.04: Pulling from nvidia/cuda
Digest: sha256:c5bf5c984998cc18a3f3a741c2bd7187ed860dc6d993b6fb402d0effb9fe6579
Status: Downloaded newer image for nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
$ docker run -it  nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
root@89f771e82a51:/# ll /usr/lib/gcc/x86_64-linux-gnu/7/include | grep ia32
root@89f771e82a51:/#

正如你所看到的,图像的sha256摘要是相同的(并且与NVidia的NGC的摘要匹配(

起初,我认为可能以某种隐藏的方式包含来自主机,但ia32intrin.h文件存在于两个主机中

是什么导致了这样的问题?

编辑

添加了每个主机的docker --version输出。这是有区别的,但我怀疑这是否会导致这样的问题

编辑2

添加了uname -a的输出

编辑3

docker version:输出

Ubuntu 16:

$ docker version
Client: Docker Engine - Community
Version:           19.03.13
API version:       1.40
Go version:        go1.13.15
Git commit:        4484c46d9d
Built:             Wed Sep 16 17:02:59 2020
OS/Arch:           linux/amd64
Experimental:      false
Server: Docker Engine - Community
Engine:
Version:          19.03.13
API version:      1.40 (minimum version 1.12)
Go version:       go1.13.15
Git commit:       4484c46d9d
Built:            Wed Sep 16 17:01:30 2020
OS/Arch:          linux/amd64
Experimental:     false
containerd:
Version:          1.3.7
GitCommit:        8fba4e9a7d01810a393d5d25a3621dc101981175
runc:
Version:          1.0.0-rc10
GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version:          0.18.0
GitCommit:        fec3683

Ubuntu 18:

$ docker version
Client: Docker Engine - Community
Version:           19.03.14
API version:       1.40
Go version:        go1.13.15
Git commit:        5eb3275d40
Built:             Tue Dec  1 19:20:17 2020
OS/Arch:           linux/amd64
Experimental:      false
Server: Docker Engine - Community
Engine:
Version:          19.03.14
API version:      1.40 (minimum version 1.12)
Go version:       go1.13.15
Git commit:       5eb3275d40
Built:            Tue Dec  1 19:18:45 2020
OS/Arch:          linux/amd64
Experimental:     false
containerd:
Version:          1.3.9
GitCommit:        ea765aba0d05254012b0b9e595e995c09186427f
runc:
Version:          1.0.0-rc10
GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version:          0.18.0
GitCommit:        fec3683

所以我在不同的Ubuntu机器(EC2实例(上测试了它,在这种情况下,18.04&16.04文件存在。看来我的机器出了问题。有什么想法可以引起这种情况吗?

最好的猜测是Ubuntu 18.04主机上的拉层在某种程度上已经损坏。清理这一问题的核选项是重置docker。这将删除所有映像、卷、容器、日志、网络等,因此在运行此之前备份您想要保留的任何内容:

sudo -s # these commands need root
systemctl stop docker
rm -rf /var/lib/docker
systemctl start docker
exit # exit sudo

相关内容

  • 没有找到相关文章

最新更新