K3S多个Kube System pod未以未知状态运行，包括DNS pod

环境信息：K3s版本：k3s版本v1.24.3+k3s1(990ba0e8(go版本go1.18.1

节点CPU体系结构、操作系统和版本：五个RPI4s运行无头64位Raspbian，每个都有以下信息Linux 5.15.56-v8+#1575 SMP PREEMPT 2022年7月22日星期五英国夏令时20:31:26 aarch64 GNU/Linux

群集配置：配置为控制平面的3个节点，配置为工作节点的2个节点

描述错误：Pods：coredns-b96499967-ktgtc、local-path-provisioner-7b7dc8d6f5-5cfds、metrics-server-668d979685-9szb9、traefik-7cd4fcf68-gfmhm和svclb-traefik-aa9f6b38-j27sw处于未知状态，0/1个Pods已就绪。这意味着集群DNS服务不起作用，因此pod无法解析内部或外部名称

复制步骤：

使用以下说明在HA模式下安装K3：https://rancher.com/docs/k3s/latest/en/installation/ha-embedded/

预期行为：重要的pod应该正在运行，状态已知。此外，DNS应该工作，这意味着，除其他外，无头服务应该工作，pod应该能够解析集群内外的主机名

实际行为：DNS Pods应该在已知状态下运行，Pods应该能够解析集群内外的主机名，无头服务应该能够在中工作

其他上下文/日志：

kubectl -n kube-system get configmap coredns -o go-template={{.data.Corefile}}
.:53 {
errors
health
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
hosts /etc/coredns/NodeHosts {
ttl 60
reload 15s
fallthrough
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
import /etc/coredns/custom/*.server

相关吊舱描述：

kubectl describe  pods --namespace=kube-system
Name:                 coredns-b96499967-ktgtc
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 master0/192.168.0.68
Start Time:           Fri, 05 Aug 2022 16:09:38 +0100
Labels:               k8s-app=kube-dns
pod-template-hash=b96499967
Annotations:          <none>
Status:               Running
IP:
IPs:                  <none>
Controlled By:        ReplicaSet/coredns-b96499967
Containers:
coredns:
Container ID:  containerd://1a83a59275abdb7b783aa06eb56cb1e5367c1ca196598851c2b7d5154c0a4bb9
Image:         rancher/mirrored-coredns-coredns:1.9.1
Image ID:      docker.io/rancher/mirrored-coredns-coredns@sha256:35e38f3165a19cb18c65d83334c13d61db6b24905f45640aa8c2d2a6f55ebcb0
Ports:         53/UDP, 53/TCP, 9153/TCP
Host Ports:    0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State:          Terminated
Reason:       Unknown
Exit Code:    255
Started:      Fri, 05 Aug 2022 19:19:19 +0100
Finished:     Fri, 05 Aug 2022 19:20:29 +0100
Ready:          False
Restart Count:  8
Limits:
memory:  170Mi
Requests:
cpu:        100m
memory:     70Mi
Liveness:     http-get http://:8080/health delay=60s timeout=1s period=10s #success=1 #failure=3
Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=2s #success=1 #failure=3
Environment:  <none>
Mounts:
/etc/coredns from config-volume (ro)
/etc/coredns/custom from custom-config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zbbxf (ro)
Conditions:
Type              Status
Initialized       True
Ready             False
ContainersReady   False
PodScheduled      True
Volumes:
config-volume:
Type:      ConfigMap (a volume populated by a ConfigMap)
Name:      coredns
Optional:  false
custom-config-volume:
Type:      ConfigMap (a volume populated by a ConfigMap)
Name:      coredns-custom
Optional:  true
kube-api-access-zbbxf:
Type:                    Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds:  3607
ConfigMapName:           kube-root-ca.crt
ConfigMapOptional:       <nil>
DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              beta.kubernetes.io/os=linux
Tolerations:                 CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type    Reason          Age                    From     Message
----    ------          ----                   ----     -------
Normal  SandboxChanged  41d (x419 over 41d)    kubelet  Pod sandbox changed, it will be killed and re-created.
Normal  SandboxChanged  64m (x11421 over 42h)  kubelet  Pod sandbox changed, it will be killed and re-created.
Normal  SandboxChanged  2m24s (x139 over 32m)  kubelet  Pod sandbox changed, it will be killed and re-created.
Name:                 metrics-server-668d979685-9szb9
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 master0/192.168.0.68
Start Time:           Fri, 05 Aug 2022 16:09:38 +0100
Labels:               k8s-app=metrics-server
pod-template-hash=668d979685
Annotations:          <none>
Status:               Running
IP:
IPs:                  <none>
Controlled By:        ReplicaSet/metrics-server-668d979685
Containers:
metrics-server:
Container ID:  containerd://cd02643f7d7bc78ea98abdec20558626cfac39f70e1127b2281342dd00905e44
Image:         rancher/mirrored-metrics-server:v0.5.2
Image ID:      docker.io/rancher/mirrored-metrics-server@sha256:48ecad4fe641a09fa4459f93c7ad29d4916f6b9cf7e934d548f1d8eff96e2f35
Port:          4443/TCP
Host Port:     0/TCP
Args:
--cert-dir=/tmp
--secure-port=4443
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--kubelet-use-node-status-port
--metric-resolution=15s
State:          Terminated
Reason:       Unknown
Exit Code:    255
Started:      Fri, 05 Aug 2022 19:19:19 +0100
Finished:     Fri, 05 Aug 2022 19:20:29 +0100
Ready:          False
Restart Count:  8
Requests:
cpu:        100m
memory:     70Mi
Liveness:     http-get https://:https/livez delay=60s timeout=1s period=10s #success=1 #failure=3
Readiness:    http-get https://:https/readyz delay=0s timeout=1s period=2s #success=1 #failure=3
Environment:  <none>
Mounts:
/tmp from tmp-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-djqgk (ro)
Conditions:
Type              Status
Initialized       True
Ready             False
ContainersReady   False
PodScheduled      True
Volumes:
tmp-dir:
Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:  <unset>
kube-api-access-djqgk:
Type:                    Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds:  3607
ConfigMapName:           kube-root-ca.crt
ConfigMapOptional:       <nil>
DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type    Reason          Age                    From     Message
----    ------          ----                   ----     -------
Normal  SandboxChanged  41d (x418 over 41d)    kubelet  Pod sandbox changed, it will be killed and re-created.
Normal  SandboxChanged  64m (x11427 over 42h)  kubelet  Pod sandbox changed, it will be killed and re-created.
Normal  SandboxChanged  2m27s (x141 over 32m)  kubelet  Pod sandbox changed, it will be killed and re-created.

Name:                 traefik-7cd4fcff68-gfmhm
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 master0/192.168.0.68
Start Time:           Fri, 05 Aug 2022 16:10:43 +0100
Labels:               app.kubernetes.io/instance=traefik
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=traefik
helm.sh/chart=traefik-10.19.300
pod-template-hash=7cd4fcff68
Annotations:          prometheus.io/path: /metrics
prometheus.io/port: 9100
prometheus.io/scrape: true
Status:               Running
IP:
IPs:                  <none>
Controlled By:        ReplicaSet/traefik-7cd4fcff68
Containers:
traefik:
Container ID:  containerd://779a1596fb204a7577acda97e9fb3f4c5728cf1655071d8e5faad6a8d407d217
Image:         rancher/mirrored-library-traefik:2.6.2
Image ID:      docker.io/rancher/mirrored-library-traefik@sha256:ad2226527eea71b7591d5e9dcc0bffd0e71b2235420c34f358de6db6d529561f
Ports:         9100/TCP, 9000/TCP, 8000/TCP, 8443/TCP
Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
Args:
--global.checknewversion
--global.sendanonymoususage
--entrypoints.metrics.address=:9100/tcp
--entrypoints.traefik.address=:9000/tcp
--entrypoints.web.address=:8000/tcp
--entrypoints.websecure.address=:8443/tcp
--api.dashboard=true
--ping=true
--metrics.prometheus=true
--metrics.prometheus.entrypoint=metrics
--providers.kubernetescrd
--providers.kubernetesingress
--providers.kubernetesingress.ingressendpoint.publishedservice=kube-system/traefik
--entrypoints.websecure.http.tls=true
State:          Terminated
Reason:       Unknown
Exit Code:    255
Started:      Fri, 05 Aug 2022 19:19:19 +0100
Finished:     Fri, 05 Aug 2022 19:20:29 +0100
Ready:          False
Restart Count:  8
Liveness:       http-get http://:9000/ping delay=10s timeout=2s period=10s #success=1 #failure=3
Readiness:      http-get http://:9000/ping delay=10s timeout=2s period=10s #success=1 #failure=1
Environment:    <none>
Mounts:
/data from data (rw)
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jw4qc (ro)
Conditions:
Type              Status
Initialized       True
Ready             False
ContainersReady   False
PodScheduled      True
Volumes:
data:
Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:  <unset>
tmp:
Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:  <unset>
kube-api-access-jw4qc:
Type:                    Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds:  3607
ConfigMapName:           kube-root-ca.crt
ConfigMapOptional:       <nil>
DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type    Reason          Age                    From     Message
----    ------          ----                   ----     -------
Normal  SandboxChanged  41d (x415 over 41d)    kubelet  Pod sandbox changed, it will be killed and re-created.
Normal  SandboxChanged  64m (x11418 over 42h)  kubelet  Pod sandbox changed, it will be killed and re-created.
Normal  SandboxChanged  2m30s (x141 over 32m)  kubelet  Pod sandbox changed, it will be killed and re-created.

我找到的解决问题的解决方案-至少目前是，手动重新启动使用命令部署找到的所有kube系统部署

kubectl get deployments --namespace=kube-system

如果它们都没有准备好，可以使用命令重新启动

kubectl -n kube-system rollout restart <deployment>

具体来说，coredns、本地路径提供器、度量服务器和traefik部署都需要重新启动

相关内容

最新更新

热门标签：