我们最近在部署环境中从docker版本17.06.0-ce升级到18.09.2。有经验的集装箱在码头日志中没有太多信息的情况下运行了几天后突然死亡。
监控内存使用情况,受影响的容器远低于所有限制(每个容器以及主机都有足够的可用内存(。
问题期间的设置观察:
- docker版本,18.09.2,大约有30个运行容器
- 经验丰富的集装箱在运行几天后就被撞死了
容器崩溃期间观察到的Docker日志
Nov 16 15:42:11 site1 containerd[1762]: time="2020-11-16T15:42:11.171040904Z" level=info msg="shim reaped" id=d39355d3061d461ad4a305c717b699bd332aae50d47c2bf2b547bef50f767c7d
Nov 16 15:42:11 site1 containerd[1762]: time="2020-11-16T15:42:11.171156262Z" level=warning msg="cleaning up after killed shim" id=d39355d3061d461ad4a305c717b699bd332aae50d47c2bf2b547bef50f767c7d namespace=moby
Nov 16 15:42:11 site1 dockerd[3022]: time="2020-11-16T15:42:11.171164295Z" level=warning msg="failed to delete process" container=d39355d3061d461ad4a305c717b699bd332aae50d47c2bf2b547bef50f767c7d error="ttrpc: client shutting down: ttrpc: closed: unknown" module=libcontainerd namespace=moby process=b0d77b1ebf2c82b09c152530a5e24491d76e216b852e385686c46128c94e7f5a
Nov 16 15:42:11 site1 c73920e3476c[3022]: INFO: 2020/11/16 15:42:11.396872 [nameserver a6:0c:6a:18:69:1f] container d39355d3061d461ad4a305c717b699bd332aae50d47c2bf2b547bef50f767c7d died; tombstoning entry test-endpoint-s104.weave.local. -> 10.44.0.14
Output of Docker version
Client:
Version: 18.09.2
API version: 1.39
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 04:13:50 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.2
API version: 1.39 (minimum version 1.12)
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 03:42:13 2019
OS/Arch: linux/amd64
Experimental: false
Output of Docker Info:
Containers: 30
Running: 25
Paused: 0
Stopped: 5
Images: 236
Server Version: 18.09.2
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: journald
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 09c8266bf2fcf9519a651b04ae54c967b9ab86ec
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-171-generic
Operating System: Ubuntu 16.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 62.92GiB
Name: fpas-site1-dra-director-a
ID: KKSM:3YNF:LE7N:NVFE:Y5C4:C6CN:LAQT:QRRZ:VYQS:O4PP:VQKG:DXTK
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
com.broadhop.swarm.uuid=uuid4:d96aef99-b5fc-44e3-b7fa-65b08b7e30f3
com.broadhop.swarm.role=endpoint-role
com.broadhop.swarm.node=
com.broadhop.swarm.hostname=site1
com.broadhop.swarm.mode=
com.broadhop.network.interfaces=internal:172.26.50.13
Experimental: false
Insecure Registries:
registry:5000
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: API is accessible on http://127.0.0.1:2375 without encryption.
Access to the remote API is equivalent to root access on the host. Refer
to the 'Docker daemon attack surface' section in the documentation for
more information: https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface
WARNING: No swap limit support
注意:由于这次部署是在关键的基础设施上进行的,我们想了解为什么会发生这种情况,并确定这种情况不会再次发生。有人在任何环境中都面临过同样的问题吗?请告诉我们是否在使用docker版本时存在已知问题。
您的go-lang版本很旧,您可以尝试更新。我在github中发现了这个问题。
https://github.com/moby/moby/issues/38742