我的集群遇到了一个奇怪的问题
在我的集群中,我有一个部署和一个Loadbalancer服务来公开这个部署它工作得很好,但是突然Loadbalancer开始返回一个错误:curl: (56) Recv failure: Connection reset by peer
当pod和负载均衡器正在运行并且它们的日志中没有错误时显示的错误
我已经试过了:
- 删除pod
- 重新部署服务+从头部署但问题仍然存在
my service yaml:
apiVersion: v1
kind: Service
metadata:
annotations:
cloud.google.com/neg: '{"ingress":true}'
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"RELEASE-NAME","app.kubernetes.io/name":"APP-NAME","app.kubernetes.io/version":"latest"},"name":"APP-NAME","namespace":"namespacex"},"spec":{"ports":[{"name":"web","port":3000}],"selector":{"app.kubernetes.io/instance":"RELEASE-NAME","app.kubernetes.io/name":"APP-NAME"},"type":"LoadBalancer"}}
creationTimestamp: "2021-08-03T07:55:00Z"
finalizers:
- service.kubernetes.io/load-balancer-cleanup
labels:
app.kubernetes.io/instance: RELEASE-NAME
app.kubernetes.io/name: APP-NAME
app.kubernetes.io/version: latest
name: APP-NAME
namespace: namespacex
resourceVersion: "14583904"
uid: 7fb4d7e6-4316-44e5-8f9b-7a466bc776da
spec:
clusterIP: 10.4.18.36
clusterIPs:
- 10.4.18.36
externalTrafficPolicy: Cluster
ports:
- name: web
nodePort: 30970
port: 3000
protocol: TCP
targetPort: 3000
selector:
app.kubernetes.io/instance: RELEASE-NAME
app.kubernetes.io/name: APP-NAME
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer:
ingress:
- ip: xx.xxx.xxx.xxx
my deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: APP-NAME
labels:
app.kubernetes.io/name: APP-NAME
app.kubernetes.io/instance: RELEASE-NAME
app.kubernetes.io/version: "latest"
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: APP-NAME
app.kubernetes.io/instance: RELEASE-NAME
template:
metadata:
annotations:
checksum/config: 5e6ff0d6fa64b90b0365e9f3939cefc0a619502b32564c4ff712067dbe44ab90
checksum/secret: 76e0a1351da90c0cef06851e3aa9e7c80b415c29b11f473d4a2520ade9c892ce
labels:
app.kubernetes.io/name: APP-NAME
app.kubernetes.io/instance: RELEASE-NAME
spec:
serviceAccountName: APP-NAME
containers:
- name: APP-NAME
image: 'docker.io/xxxxxxxx:latest'
imagePullPolicy: "Always"
ports:
- name: http
containerPort: 3000
livenessProbe:
httpGet:
path: /balancer/
port: http
readinessProbe:
httpGet:
path: /balancer/
port: http
env:
...
volumeMounts:
- name: config-volume
mountPath: /home/app/config/
resources:
limits:
cpu: 400m
memory: 256Mi
requests:
cpu: 400m
memory: 256Mi
volumes:
- name: config-volume
configMap:
name: app-config
imagePullSecrets:
- name: secret
在我的情况下,问题变成了一个网络组件(如FW)在没有明显原因的情况下将集群调暗为"不安全"后阻止出站连接
所以本质上这不是K8s的问题,而是it的问题