当我关闭PostgreSQL时,阻止Kubernetes重新启动容器



我正在维护一个Kubernetes集群,该集群包括位于两个不同pod中的两个PostgreSQL服务器,一个是主服务器,另一个是副本服务器。复制副本是通过日志传送从主复制副本同步的。

一个故障导致日志传送开始失败,因此副本不再与主副本同步。

使复制副本与主副本重新同步的过程需要停止复制副本的postgres服务。这就是我遇到麻烦的地方。

似乎Kubernetes在我关闭postgres服务后立即重新启动容器,这会立即重新启动postgres。我需要容器在停止postgres服务的情况下运行,以便执行修复损坏的复制的下一步操作。

如何让Kubernetes允许我在不重新启动容器的情况下关闭postgres服务?

更多详细信息:

为了停止复制,我通过kubectl exec -it <pod name> -- /bin/sh在复制吊舱上运行shell,然后从shell运行pg_ctl stop。我得到以下回复:

server shutting down
command terminated with exit code 137

我被踢出了壳。

当我运行kubectl describe pod时,我看到以下内容:

Name:         pgset-primary-1
Namespace:    qa
Priority:     0
Node:         aks-nodepool1-95718424-0/10.240.0.4
Start Time:   Fri, 09 Jul 2021 13:48:06 +1200
Labels:       app=pgset-primary
controller-revision-hash=pgset-primary-6d7d65c8c7
name=pgset-replica
statefulset.kubernetes.io/pod-name=pgset-primary-1
Annotations:  <none>
Status:       Running
IP:           10.244.1.42
IPs:
IP:           10.244.1.42
Controlled By:  StatefulSet/pgset-primary
Containers:
pgset-primary:
Container ID:   containerd://bc00b4904ab683d9495ad020328b5033ecb00d19c9e5b12d22de18f828918455
Image:          *****/crunchy-postgres:centos7-9.6.8-1.6.0
Image ID:       docker.io/*****/crunchy-postgres@sha256:2850e00f9a619ff4bb6ff889df9bcb2529524ca8110607e4a7d9e36d00879057
Port:           5432/TCP
Host Port:      0/TCP
State:          Running
Started:      Sat, 06 Nov 2021 18:29:34 +1300
Last State:     Terminated
Reason:       Completed
Exit Code:    0
Started:      Sat, 06 Nov 2021 18:28:09 +1300
Finished:     Sat, 06 Nov 2021 18:29:18 +1300
Ready:          True
Restart Count:  6
Limits:
cpu:     250m
memory:  512Mi
Requests:
cpu:     10m
memory:  256Mi
Environment:
PGHOST:                 /tmp
PG_PRIMARY_USER:        primaryuser
PG_MODE:                set
PG_PRIMARY_HOST:        pgset-primary
PG_REPLICA_HOST:        pgset-replica
PG_PRIMARY_PORT:        5432
[...]
ARCHIVE_TIMEOUT:        60
MAX_WAL_KEEP_SEGMENTS:  400
Mounts:
/backrestrepo from backrestrepo (rw)
/pgconf from pgbackrestconf (rw)
/pgdata from pgdata (rw)
/var/run/secrets/kubernetes.io/serviceaccount from pgset-sa-token-nh6ng (ro)
Conditions:
Type              Status
Initialized       True
Ready             True
ContainersReady   True
PodScheduled      True
Volumes:
pgdata:
Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName:  pgdata-pgset-primary-1
ReadOnly:   false
backrestrepo:
Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName:  backrestrepo-pgset-primary-1
ReadOnly:   false
pgbackrestconf:
Type:      ConfigMap (a volume populated by a ConfigMap)
Name:      pgbackrest-configmap
Optional:  false
pgset-sa-token-nh6ng:
Type:        Secret (a volume populated by a Secret)
SecretName:  pgset-sa-token-nh6ng
Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type     Reason   Age                 From     Message
----     ------   ----                ----     -------
Warning  BackOff  88m (x3 over 3h1m)  kubelet  Back-off restarting failed container
Normal   Pulled   88m (x7 over 120d)  kubelet  Container image "*****/crunchy-postgres:centos7-9.6.8-1.6.0" already present on machine
Normal   Created  88m (x7 over 120d)  kubelet  Created container pgset-primary
Normal   Started  88m (x7 over 120d)  kubelet  Started container pgset-primary

事件表明该容器是由Kubernetes启动的。

pod没有活跃度或就绪度探测器,所以我不知道当我关闭容器中运行的postgres服务时,是什么促使Kubernetes重新启动容器。

这是由于restartPolicy造成的。容器生命周期由于其过程已完成而终止。如果不希望创建新容器,则需要更改这些pod的重新启动策略。

如果这个吊舱是部署的一部分,请查看kubectl explain deployment.spec.template.spec.restartPolicy

最新更新