Readiness Probe for Redis with large dataset

问题

我有一个Redis K8s部署，它链接到一个单独的服务，清单大幅减少，如下所示(如果需要更多信息，请告诉我(：

apiVersion: apps/v1
kind: Deployment
spec:
replicas: 2
selector:
matchLabels:
app: cache
environment: dev
template:
metadata:
labels:
app: cache
environment: dev
spec:
containers:
- name: cache
image: marketplace.gcr.io/google/redis5
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 30
timeoutSeconds: 5
readinessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 30
timeoutSeconds: 5
volumes:
- name: data
nfs:
server: "nfs-server.recs-api.svc.cluster.local"
path: "/data"

我想定期用新的数据集重新部署Redis，而不是更新现有的缓存。当执行kubectl rollout restart deployment/cache时，在新Redis pod准备好接受流量之前，旧的Redis pod将被终止。这些新的Redis pod被标记为READY，正如预期的那样，旧的是Terminated，但是新Redis pod上的redis-cli ping返回(error) LOADING Redis is loading the dataset in memory。Redis目前需要5-10分钟才能停止加载数据集并准备好接受连接，但到目前为止，它们已经准备好了相同的时间，活动流量指向它们，因为旧的pod已经终止。

我的怀疑是，由于此响应的状态代码为0，因此readinessProbe触发READY 1/1并杀死旧的pod，但我一直未能找到合适的exec: command:来避免此问题。

redis-cli info有一条loading:0|1线，所以我测试了：

readinessProbe:
exec:
command: ["redis-cli", "info", "|", "grep loading:", "|", "grep 0"]

希望对于非0加载值，grep将提供一个非零状态代码并使readinesProbe失败，但这似乎不起作用，并且具有与redis-cli ping相同的行为，过早终止pod并在加载完成之前失去服务。

我想要什么

在部署新的Redis缓存pod时，我希望有一个pod可以随时接受连接，而新的Redis缓存pod正在将数据集加载到内存中
- 理想情况下，以整洁的准备形式进行探测检查，但完全接受任何建议
- 也有可能我误解了准备长袍的目的，所以请告诉我
如果可能的话，更好地理解为什么redis-cli ping或其他readinesProbes仍然触发新pod的READY状态，尽管exec: command:上有非零状态代码

谢谢！

我研究了bitnami/redis图表，并了解了它们如何实现活跃度/就绪度探测。

从图表中，他们创建了一个健康配置映射，其中包含一个使用reds-cli-ping对redis服务器进行健康检查的shell脚本，并处理响应。

以下是定义的配置映射：

data:
ping_readiness_local.sh: |-
#!/bin/bash
{{- if .Values.usePasswordFile }}
password_aux=`cat ${REDIS_PASSWORD_FILE}`
export REDIS_PASSWORD=$password_aux
{{- end }}
{{- if .Values.usePassword }}
no_auth_warning=$([[ "$(redis-cli --version)" =~ (redis-cli 5.*) ]] && echo --no-auth-warning)
{{- end }}
response=$(
timeout -s 3 $1 
redis-cli 
{{- if .Values.usePassword }}
-a $REDIS_PASSWORD $no_auth_warning 
{{- end }}
-h localhost 
{{- if .Values.tls.enabled }}
-p $REDIS_TLS_PORT 
--tls 
--cacert {{ template "redis.tlsCACert" . }} 
{{- if .Values.tls.authClients }}
--cert {{ template "redis.tlsCert" . }} 
--key {{ template "redis.tlsCertKey" . }} 
{{- end }}
{{- else }}
-p $REDIS_PORT 
{{- end }}
ping
)
if [ "$response" != "PONG" ]; then
echo "$response"
exit 1
fi

在deployment/statefulset中，只需将探测器设置为执行以下shell脚本：

readinessProbe:
initialDelaySeconds: {{ .Values.redis.readinessProbe.initialDelaySeconds }}
periodSeconds: {{ .Values.redis.readinessProbe.periodSeconds }}
timeoutSeconds: {{ .Values.redis.readinessProbe.timeoutSeconds }}
successThreshold: {{ .Values.redis.readinessProbe.successThreshold }}
failureThreshold: {{ .Values.redis.readinessProbe.failureThreshold }}
exec:
command:
- sh
- -c
- /scripts/ping_readiness_local.sh {{ .Values.redis.readinessProbe.timeoutSeconds }}

下面的应该可以正常工作

关键是

tcpSocket:
port: client # named port

整个片段

- name: redis
image: ${DOCKER_PATH_AND_IMAGE}
resources:
limits:
memory: "1.5Gi"
requests:
memory: "1.5Gi"
ports:
- name: client
containerPort: 6379
- name: gossip
containerPort: 16379
command: ["/conf/update-node.sh", "redis-server", "/conf/redis.conf"]
livenessProbe:
tcpSocket:
port: client # named port
initialDelaySeconds: 30
timeoutSeconds: 5
periodSeconds: 5
failureThreshold: 5
successThreshold: 1
readinessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 20
timeoutSeconds: 5
periodSeconds: 3

问题

我想要什么

相关内容

最新更新

热门标签：