kubernetes liveness探测失败,但手动探测成功



我在pod中为一个长时间运行的应用程序设置了一个liveness探测器。它在一天内发生了几次故障,导致吊舱重新启动了几次。没有准备就绪探测器。

livenessProbe:
httpGet:
path: /
port: http
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 20
periodSeconds: 20
successThreshold: 1
failureThreshold: 3

对应用程序代码或docker图像的进一步检查没有发现任何异常。因此,我禁用了liveness探测,并使用连接到网络的PC上的python脚本每隔10秒手动探测NodePort服务。手动探针虽然比活性探针更频繁、更严格,但成功无误。每次ping持续约200~400ms

手动探针与设置的活性探针大致相同

timeoutSeconds: 500ms
periodSeconds: 10
successThreshold: 1
failureThreshold: 1

为什么它成功了,而活力探测却失败了?这是否表明k8s存在网络问题?

吊舱清单:

kind: Pod
apiVersion: v1
metadata:
name: pypi-pypiserver-74b689df7-rh9bm
namespace: default
labels:
app.kubernetes.io/instance: pypi
app.kubernetes.io/name: pypiserver
spec:
volumes:
- name: secrets
secret:
secretName: pypi-pypiserver
defaultMode: 420
- name: packages
persistentVolumeClaim:
claimName: pypi-pypiserver
- name: default-token-cx7m7
secret:
secretName: default-token-cx7m7
defaultMode: 420
containers:
- name: pypiserver
image: 'registry.digitalocean.com/evergreen/pypiserver:latest'
args:
- run
- '--passwords=.'
- '--authenticate=.'
- '--port=8080'
- '--welcome=/dev/null'
- '--server=wsgiref'
- /data/packages
ports:
- name: http
containerPort: 8080
protocol: TCP
resources:
limits:
cpu: 1600m
memory: 1Gi
requests:
cpu: 400m
memory: 256Mi
volumeMounts:
- name: packages
mountPath: /data/packages
mountPropagation: None
- name: secrets
readOnly: true
mountPath: /config
- name: default-token-cx7m7
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
livenessProbe:
httpGet:
path: /
port: http
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 10
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
nodeSelector:
doks.digitalocean.com/node-pool: k8s-node-pool-hive-dev-2
serviceAccountName: default
serviceAccount: default
nodeName: k8s-node-pool-hive-dev-2-8adyc
securityContext:
runAsUser: 9898
runAsGroup: 9898
fsGroup: 9898
imagePullSecrets:
- name: evergreen
schedulerName: default-scheduler
tolerations:
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 300
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 300
priority: 0
enableServiceLinks: true
preemptionPolicy: PreemptLowerPriority

NodePort探测只是确认svc在此端口可用。它不会检查吊舱是否带电。检查livenessprobe是否有吊舱容器可用性。

此处提供更多详细信息https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

最新更新