Kubernetes新手,但在过去使用过K3s。只需设置一个K8s集群。我的pod都不能做DNS查找,即使是谷歌,或内部域。
I init'd with:--pod-network-cidr=10.244.0.0/16
。安装了Metal-LB(10.7.7.10-10.7.7.254),节点和master运行ip为10.7.50。X/16和10.7.60。X/16分别。使用默认的Kube-Flannel设置法兰绒:https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
到目前为止,它只是一个主节点和两个节点。
版本:
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:44:22Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:39:34Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
$ kubelet --version
Kubernetes v1.22.1
故障排除命令:
$ kubectl describe service kube-dns -n kube-system
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=CoreDNS
Annotations: prometheus.io/port: 9153
prometheus.io/scrape: true
Selector: k8s-app=kube-dns
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.96.0.10
IPs: 10.96.0.10
Port: dns 53/UDP
TargetPort: 53/UDP
Endpoints: 10.244.1.20:53,10.244.2.28:53
Port: dns-tcp 53/TCP
TargetPort: 53/TCP
Endpoints: 10.244.1.20:53,10.244.2.28:53
Port: metrics 9153/TCP
TargetPort: 9153/TCP
Endpoints: 10.244.1.20:9153,10.244.2.28:9153
Session Affinity: None
Events: <none>
$ kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-84f8874d6d-jgvwk 1/1 Running 1 (115m ago) 21h 10.244.1.20 k-w-001 <none> <none>
coredns-84f8874d6d-qh2f4 1/1 Running 1 (115m ago) 21h 10.244.2.28 k-w-002 <none> <none>
etcd-k-m-001 1/1 Running 12 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-apiserver-k-m-001 1/1 Running 11 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-controller-manager-k-m-001 1/1 Running 12 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-flannel-ds-286dc 1/1 Running 10 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-flannel-ds-rbmhx 1/1 Running 6 (114m ago) 2d21h 10.7.60.11 k-w-001 <none> <none>
kube-flannel-ds-vjl7l 1/1 Running 4 (115m ago) 2d21h 10.7.60.12 k-w-002 <none> <none>
kube-proxy-948z8 1/1 Running 8 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-proxy-l7h64 1/1 Running 4 (115m ago) 2d21h 10.7.60.12 k-w-002 <none> <none>
kube-proxy-pqmsr 1/1 Running 4 (115m ago) 2d21h 10.7.60.11 k-w-001 <none> <none>
kube-scheduler-k-m-001 1/1 Running 12 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
metrics-server-6dfddc5fb8-47mnb 0/1 Running 3 (115m ago) 2d20h 10.244.1.21 k-w-001 <none> <none>
$ kubectl logs --namespace=kube-system coredns-84f8874d6d-jgvwk
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.4
linux/amd64, go1.16.4, 053c4d5
$ kubectl logs --namespace=kube-system coredns-84f8874d6d-qh2f4
[INFO] plugin/ready: Still waiting on: "kubernetes"
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.4
linux/amd64, go1.16.4, 053c4d5
每隔几秒进行一次测试:
$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10:53
*** Can't find kubernetes.default: No answer
*** Can't find kubernetes.default: No answer
$ kubectl exec -ti busybox -- nslookup kubernetes.default
;; connection timed out; no servers could be reached
command terminated with exit code 1
这里有更多的测试:
$ kubectl exec -ti busybox -- nslookup google.com
;; connection timed out; no servers could be reached
command terminated with exit code 1
$ kubectl exec -ti busybox -- nslookup google.com 8.8.8.8
Server: 8.8.8.8
Address: 8.8.8.8:53
Non-authoritative answer:
Name: google.com
Address: 142.251.33.78
*** Can't find google.com: No answer
$ kubectl exec -ti busybox -- ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=116 time=6.437 ms
$ kubectl exec busybox -- cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
$ kubectl exec -ti busybox -- nslookup kubernetes.default 10.96.0.10
Server: 10.96.0.10
Address: 10.96.0.10:53
*** Can't find kubernetes.default: No answer
*** Can't find kubernetes.default: No answer
$ kubectl exec -ti busybox -- nslookup kubernetes.default 10.96.0.10
;; connection timed out; no servers could be reached
command terminated with exit code 1
我还注意到kube-dns服务将应用程序选择器设置为k8s-app=kube-dns
,而coredns具有标签k8s-app=kube-dns
,这是正确的吗?
在kube-system名称空间中运行的pod似乎有两个不同的IP范围。一个使用Node的IP,另一个使用Flannels。
我不确定这里发生了什么,对Kubernetes来说是新的,但看起来DNS pod或服务根本不工作。
编辑:
进一步信息:
$ sudo ufw status
Status: inactive
问题实际上是法兰绒。DNS查询工作正常,直到节点重新启动,然后所有pod查询失败,直到Flannel pod重新启动。
天哪,这是一个兔子洞。
见:https://github.com/flannel-io/flannel/issues/1321