环境信息:K3s版本:k3s版本v1.24.3+k3s1(990ba0e8(go版本go1.18.1
节点CPU体系结构、操作系统和版本:五个RPI4s运行无头64位Raspbian,每个都有以下信息Linux 5.15.56-v8+#1575 SMP PREEMPT 2022年7月22日星期五英国夏令时20:31:26 aarch64 GNU/Linux
群集配置:配置为控制平面的3个节点,配置为工作节点的2个节点
描述错误:Pods:coredns-b96499967-ktgtc、local-path-provisioner-7b7dc8d6f5-5cfds、metrics-server-668d979685-9szb9、traefik-7cd4fcf68-gfmhm和svclb-traefik-aa9f6b38-j27sw处于未知状态,0/1个Pods已就绪。这意味着集群DNS服务不起作用,因此pod无法解析内部或外部名称
复制步骤:
- 使用以下说明在HA模式下安装K3:https://rancher.com/docs/k3s/latest/en/installation/ha-embedded/
预期行为:重要的pod应该正在运行,状态已知。此外,DNS应该工作,这意味着,除其他外,无头服务应该工作,pod应该能够解析集群内外的主机名
实际行为:DNS Pods应该在已知状态下运行,Pods应该能够解析集群内外的主机名,无头服务应该能够在中工作
其他上下文/日志:
kubectl -n kube-system get configmap coredns -o go-template={{.data.Corefile}}
.:53 {
errors
health
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
hosts /etc/coredns/NodeHosts {
ttl 60
reload 15s
fallthrough
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
import /etc/coredns/custom/*.server
相关吊舱描述:
kubectl describe pods --namespace=kube-system
Name: coredns-b96499967-ktgtc
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: master0/192.168.0.68
Start Time: Fri, 05 Aug 2022 16:09:38 +0100
Labels: k8s-app=kube-dns
pod-template-hash=b96499967
Annotations: <none>
Status: Running
IP:
IPs: <none>
Controlled By: ReplicaSet/coredns-b96499967
Containers:
coredns:
Container ID: containerd://1a83a59275abdb7b783aa06eb56cb1e5367c1ca196598851c2b7d5154c0a4bb9
Image: rancher/mirrored-coredns-coredns:1.9.1
Image ID: docker.io/rancher/mirrored-coredns-coredns@sha256:35e38f3165a19cb18c65d83334c13d61db6b24905f45640aa8c2d2a6f55ebcb0
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Terminated
Reason: Unknown
Exit Code: 255
Started: Fri, 05 Aug 2022 19:19:19 +0100
Finished: Fri, 05 Aug 2022 19:20:29 +0100
Ready: False
Restart Count: 8
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=2s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/etc/coredns/custom from custom-config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zbbxf (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
custom-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns-custom
Optional: true
kube-api-access-zbbxf:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 41d (x419 over 41d) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 64m (x11421 over 42h) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 2m24s (x139 over 32m) kubelet Pod sandbox changed, it will be killed and re-created.
Name: metrics-server-668d979685-9szb9
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: master0/192.168.0.68
Start Time: Fri, 05 Aug 2022 16:09:38 +0100
Labels: k8s-app=metrics-server
pod-template-hash=668d979685
Annotations: <none>
Status: Running
IP:
IPs: <none>
Controlled By: ReplicaSet/metrics-server-668d979685
Containers:
metrics-server:
Container ID: containerd://cd02643f7d7bc78ea98abdec20558626cfac39f70e1127b2281342dd00905e44
Image: rancher/mirrored-metrics-server:v0.5.2
Image ID: docker.io/rancher/mirrored-metrics-server@sha256:48ecad4fe641a09fa4459f93c7ad29d4916f6b9cf7e934d548f1d8eff96e2f35
Port: 4443/TCP
Host Port: 0/TCP
Args:
--cert-dir=/tmp
--secure-port=4443
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--kubelet-use-node-status-port
--metric-resolution=15s
State: Terminated
Reason: Unknown
Exit Code: 255
Started: Fri, 05 Aug 2022 19:19:19 +0100
Finished: Fri, 05 Aug 2022 19:20:29 +0100
Ready: False
Restart Count: 8
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get https://:https/livez delay=60s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:https/readyz delay=0s timeout=1s period=2s #success=1 #failure=3
Environment: <none>
Mounts:
/tmp from tmp-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-djqgk (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-djqgk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 41d (x418 over 41d) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 64m (x11427 over 42h) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 2m27s (x141 over 32m) kubelet Pod sandbox changed, it will be killed and re-created.
Name: traefik-7cd4fcff68-gfmhm
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: master0/192.168.0.68
Start Time: Fri, 05 Aug 2022 16:10:43 +0100
Labels: app.kubernetes.io/instance=traefik
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=traefik
helm.sh/chart=traefik-10.19.300
pod-template-hash=7cd4fcff68
Annotations: prometheus.io/path: /metrics
prometheus.io/port: 9100
prometheus.io/scrape: true
Status: Running
IP:
IPs: <none>
Controlled By: ReplicaSet/traefik-7cd4fcff68
Containers:
traefik:
Container ID: containerd://779a1596fb204a7577acda97e9fb3f4c5728cf1655071d8e5faad6a8d407d217
Image: rancher/mirrored-library-traefik:2.6.2
Image ID: docker.io/rancher/mirrored-library-traefik@sha256:ad2226527eea71b7591d5e9dcc0bffd0e71b2235420c34f358de6db6d529561f
Ports: 9100/TCP, 9000/TCP, 8000/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
Args:
--global.checknewversion
--global.sendanonymoususage
--entrypoints.metrics.address=:9100/tcp
--entrypoints.traefik.address=:9000/tcp
--entrypoints.web.address=:8000/tcp
--entrypoints.websecure.address=:8443/tcp
--api.dashboard=true
--ping=true
--metrics.prometheus=true
--metrics.prometheus.entrypoint=metrics
--providers.kubernetescrd
--providers.kubernetesingress
--providers.kubernetesingress.ingressendpoint.publishedservice=kube-system/traefik
--entrypoints.websecure.http.tls=true
State: Terminated
Reason: Unknown
Exit Code: 255
Started: Fri, 05 Aug 2022 19:19:19 +0100
Finished: Fri, 05 Aug 2022 19:20:29 +0100
Ready: False
Restart Count: 8
Liveness: http-get http://:9000/ping delay=10s timeout=2s period=10s #success=1 #failure=3
Readiness: http-get http://:9000/ping delay=10s timeout=2s period=10s #success=1 #failure=1
Environment: <none>
Mounts:
/data from data (rw)
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jw4qc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-jw4qc:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 41d (x415 over 41d) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 64m (x11418 over 42h) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 2m30s (x141 over 32m) kubelet Pod sandbox changed, it will be killed and re-created.
我找到的解决问题的解决方案-至少目前是,手动重新启动使用命令部署找到的所有kube系统部署
kubectl get deployments --namespace=kube-system
如果它们都没有准备好,可以使用命令重新启动
kubectl -n kube-system rollout restart <deployment>
具体来说,coredns、本地路径提供器、度量服务器和traefik部署都需要重新启动