我正在维护一个Kubernetes集群,该集群包括位于两个不同pod中的两个PostgreSQL服务器,一个是主服务器,另一个是副本服务器。复制副本是通过日志传送从主复制副本同步的。
一个故障导致日志传送开始失败,因此副本不再与主副本同步。
使复制副本与主副本重新同步的过程需要停止复制副本的postgres服务。这就是我遇到麻烦的地方。
似乎Kubernetes在我关闭postgres服务后立即重新启动容器,这会立即重新启动postgres。我需要容器在停止postgres服务的情况下运行,以便执行修复损坏的复制的下一步操作。
如何让Kubernetes允许我在不重新启动容器的情况下关闭postgres服务?
更多详细信息:
为了停止复制,我通过kubectl exec -it <pod name> -- /bin/sh
在复制吊舱上运行shell,然后从shell运行pg_ctl stop
。我得到以下回复:
server shutting down
command terminated with exit code 137
我被踢出了壳。
当我运行kubectl describe pod
时,我看到以下内容:
Name: pgset-primary-1
Namespace: qa
Priority: 0
Node: aks-nodepool1-95718424-0/10.240.0.4
Start Time: Fri, 09 Jul 2021 13:48:06 +1200
Labels: app=pgset-primary
controller-revision-hash=pgset-primary-6d7d65c8c7
name=pgset-replica
statefulset.kubernetes.io/pod-name=pgset-primary-1
Annotations: <none>
Status: Running
IP: 10.244.1.42
IPs:
IP: 10.244.1.42
Controlled By: StatefulSet/pgset-primary
Containers:
pgset-primary:
Container ID: containerd://bc00b4904ab683d9495ad020328b5033ecb00d19c9e5b12d22de18f828918455
Image: *****/crunchy-postgres:centos7-9.6.8-1.6.0
Image ID: docker.io/*****/crunchy-postgres@sha256:2850e00f9a619ff4bb6ff889df9bcb2529524ca8110607e4a7d9e36d00879057
Port: 5432/TCP
Host Port: 0/TCP
State: Running
Started: Sat, 06 Nov 2021 18:29:34 +1300
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 06 Nov 2021 18:28:09 +1300
Finished: Sat, 06 Nov 2021 18:29:18 +1300
Ready: True
Restart Count: 6
Limits:
cpu: 250m
memory: 512Mi
Requests:
cpu: 10m
memory: 256Mi
Environment:
PGHOST: /tmp
PG_PRIMARY_USER: primaryuser
PG_MODE: set
PG_PRIMARY_HOST: pgset-primary
PG_REPLICA_HOST: pgset-replica
PG_PRIMARY_PORT: 5432
[...]
ARCHIVE_TIMEOUT: 60
MAX_WAL_KEEP_SEGMENTS: 400
Mounts:
/backrestrepo from backrestrepo (rw)
/pgconf from pgbackrestconf (rw)
/pgdata from pgdata (rw)
/var/run/secrets/kubernetes.io/serviceaccount from pgset-sa-token-nh6ng (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
pgdata:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: pgdata-pgset-primary-1
ReadOnly: false
backrestrepo:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: backrestrepo-pgset-primary-1
ReadOnly: false
pgbackrestconf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: pgbackrest-configmap
Optional: false
pgset-sa-token-nh6ng:
Type: Secret (a volume populated by a Secret)
SecretName: pgset-sa-token-nh6ng
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 88m (x3 over 3h1m) kubelet Back-off restarting failed container
Normal Pulled 88m (x7 over 120d) kubelet Container image "*****/crunchy-postgres:centos7-9.6.8-1.6.0" already present on machine
Normal Created 88m (x7 over 120d) kubelet Created container pgset-primary
Normal Started 88m (x7 over 120d) kubelet Started container pgset-primary
事件表明该容器是由Kubernetes启动的。
pod没有活跃度或就绪度探测器,所以我不知道当我关闭容器中运行的postgres服务时,是什么促使Kubernetes重新启动容器。
这是由于restartPolicy造成的。容器生命周期由于其过程已完成而终止。如果不希望创建新容器,则需要更改这些pod的重新启动策略。
如果这个吊舱是部署的一部分,请查看kubectl explain deployment.spec.template.spec.restartPolicy