Docker Swarm and Docker Service



Swarm Gurus,

我刚刚用3台主机建立了我的第一个Docker Swarm环境。按照以下手册操作:

https://docs.docker.com/engine/install/ubuntu/
https://docs.docker.com/engine/swarm/swarm-tutorial/
https://docs.docker.com/engine/swarm/swarm-tutorial/create-swarm/
https://docs.docker.com/engine/swarm/swarm-tutorial/deploy-service/
https://docs.docker.com/engine/swarm/swarm-tutorial/scale-service/

我能够设置它,并能够创建一个有5个副本的服务,并按预期工作。容器分布在3个节点(管理器节点和2个工作节点(上。

然后我开始试验关闭所有3个节点并启动它们。我创建的服务(名为helloworld(是由docker自动生成的,并被恢复为一个集群。

但我注意到一件事,原来的集装箱已经不在了,但我得到了这个:

someuser@manager:~$ docker service ps helloworld --no-trunc
ID                          NAME               IMAGE                                                                                   NODE      DESIRED STATE   CURRENT STATE            ERROR                                                         PORTS
8vlswsfq8ub5xn9vd401ilskn   helloworld.1       alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   manager   Running         Running 30 minutes ago
jqfgg41xppf7xcchnkvjyesyx    _ helloworld.1   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   manager   Shutdown        Failed 30 minutes ago    "No such container: helloworld.1.jqfgg41xppf7xcchnkvjyesyx"
wy382jy2yncpv6b3y1y0qfq3h   helloworld.2       alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   manager   Running         Running 30 minutes ago
mq7w469vck8hzr7p9w22f0rt1    _ helloworld.2   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   manager   Shutdown        Failed 30 minutes ago    "No such container: helloworld.2.mq7w469vck8hzr7p9w22f0rt1"
jp5wbvbdxxgh60vzef9iz73aj   helloworld.3       alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   worker01   Running         Running 30 minutes ago
t5wgad0dhu5hoyp3kjrdela4b    _ helloworld.3   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   worker01   Shutdown        Failed 30 minutes ago    "No such container: helloworld.3.t5wgad0dhu5hoyp3kjrdela4b"
km03jrxlvam162i8pt2ix6vlf   helloworld.4       alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   worker02   Running         Running 29 minutes ago
8hjnbjz4nmpqncmva4ubeqpx6    _ helloworld.4   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   worker02   Shutdown        Failed 30 minutes ago    "No such container: helloworld.4.8hjnbjz4nmpqncmva4ubeqpx6"
knbvl6el13l0poofdv1g6j11z   helloworld.5       alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   worker02   Running         Running 29 minutes ago
thlnyngdbwwsi30fuxx4wx7cd    _ helloworld.5   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   worker02   Shutdown        Failed 30 minutes ago    "No such container: helloworld.5.thlnyngdbwwsi30fuxx4wx7cd"

我对新容器完全满意,因为我没有优雅地关闭节点,而不优雅地关闭它们是测试用例的一部分。

但我想去掉那些失败的节点。以下是:

jqfgg41xppf7xcchnkvjyesyx    _ helloworld.1   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   manager   Shutdown        Failed 30 minutes ago    "No such container: helloworld.1.jqfgg41xppf7xcchnkvjyesyx"
mq7w469vck8hzr7p9w22f0rt1    _ helloworld.2   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   manager   Shutdown        Failed 30 minutes ago    "No such container: helloworld.2.mq7w469vck8hzr7p9w22f0rt1"
t5wgad0dhu5hoyp3kjrdela4b    _ helloworld.3   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   worker01   Shutdown        Failed 30 minutes ago    "No such container: helloworld.3.t5wgad0dhu5hoyp3kjrdela4b"
8hjnbjz4nmpqncmva4ubeqpx6    _ helloworld.4   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   worker02   Shutdown        Failed 30 minutes ago    "No such container: helloworld.4.8hjnbjz4nmpqncmva4ubeqpx6"
thlnyngdbwwsi30fuxx4wx7cd    _ helloworld.5   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   worker02   Shutdown        Failed 30 minutes ago    "No such container: helloworld.5.thlnyngdbwwsi30fuxx4wx7cd"

我尝试了以下方法:

$ docker rm _ helloworld.1
$ docker rm helloworld.1.jqfgg41xppf7xcchnkvjyesyx
$ docker rm --link _ helloworld.1
$ docker rm --link helloworld.1.jqfgg41xppf7xcchnkvjyesyx

但所有这些都没有奏效。

非常感谢你的建议。

感谢

docker ps列出了与服务相关的所有任务,任务可以处于各种状态:启动、运行、完成等。

正在运行的任务与容器相关联。

独立跟踪任务的实用性在于,从docker service ps列表中,您可以在一些docker命令中使用任务id,而不是服务id,例如docker service logs <task id>,在这种情况下,您可以具体找出特定任务失败的原因。

您也可以docker inspect <task id>,它将返回一个数据块,指示任务根本无法启动的原因。但是,如果它确实启动了,那么实际运行任务的容器id,您可以使用它来转到实际节点,并检查OOM错误或容器日志中的错误。

您可以清理与已完成任务相关联的容器,但docker会自动保留与--max更新重试次数相适应的任务历史记录-将此值设置得越小,历史记录就越小-但您仍然可以;t(真的不想(清除它。

最新更新