我有来自不同网络的多个kvm节点。所有这些节点都有两个ifaceeth0: 10.0.2.15/24
、eth1: 10.201.(14|12|11).0/24
,并且在dc之间很少有手动路由。
root@k8s-hv09:~# ip r
default via 10.0.2.2 dev eth0 proto dhcp src 10.0.2.15 metric 100
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
10.0.2.2 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
10.201.12.0/24 dev eth1 proto kernel scope link src 10.201.12.179
10.201.14.0/24 via 10.201.12.2 dev eth1 proto static
10.201.11.0/24 via 10.201.12.2 dev eth1 proto static
所有节点的说明
Ubuntu 16.04/18.04
Kubernetes 1.13.2
Kubernetes-cni 0.6.0
docker-ce 18.06.1
主节点(k8s-hv06)
apiVersion: kubeadm.k8s.io/v1beta1
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: 10.201.14.176:6443
controllerManager: {}
dns:
type: CoreDNS
etcd:
external:
caFile: ""
certFile: ""
endpoints:
- http://10.201.14.176:2379
- http://10.201.12.180:2379
- http://10.201.11.171:2379
keyFile: ""
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.13.2
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
scheduler: {}
Flannel v0.10.0与rbac和附加arg一起使用——iface=eth1。一个或多个主节点工作正常。
root@k8s-hv06:~# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-86c58d9df4-b4tf9 1/1 Running 2 23h
kube-system coredns-86c58d9df4-h6nq8 1/1 Running 2 23h
kube-system kube-apiserver-k8s-hv06 1/1 Running 3 23h
kube-system kube-controller-manager-k8s-hv06 1/1 Running 5 23h
kube-system kube-flannel-ds-amd64-rsmhj 1/1 Running 0 21h
kube-system kube-proxy-s5n8l 1/1 Running 3 23h
kube-system kube-scheduler-k8s-hv06 1/1 Running 4 23h
但我无法将任何工作节点添加到集群中。例如,我已经安装了带有docker ce、kubeadm、kubelet 的Ubuntu 18.04
root@k8s-hv09:~# dpkg -l | grep -E 'kube|docker' | awk '{print $1,$2,$3}'
hi docker-ce 18.06.1~ce~3-0~ubuntu
hi kubeadm 1.13.2-00
hi kubectl 1.13.2-00
hi kubelet 1.13.2-00
ii kubernetes-cni 0.6.0-00
我正在尝试将工作节点(k8s-hv09)添加到集群
root@k8s-hv06:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-hv06 Ready master 23h v1.13.2
k8s-hv09 Ready <none> 31s v1.13.2
root@k8s-hv06:~# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-86c58d9df4-b4tf9 1/1 Running 2 23h
kube-system coredns-86c58d9df4-h6nq8 1/1 Running 2 23h
kube-system kube-apiserver-k8s-hv06 1/1 Running 3 23h
kube-system kube-controller-manager-k8s-hv06 1/1 Running 5 23h
kube-system kube-flannel-ds-amd64-cqw5p 0/1 CrashLoopBackOff 3 113s
kube-system kube-flannel-ds-amd64-rsmhj 1/1 Running 0 22h
kube-system kube-proxy-hbnpq 1/1 Running 0 113s
kube-system kube-proxy-s5n8l 1/1 Running 3 23h
kube-system kube-scheduler-k8s-hv06 1/1 Running 4 23h
CCD_ 3和CCD_ 4没有创建,并且无法建立到主节点的连接。
root@k8s-hv09:~# ip a | grep -E '(flannel|cni|cbr|eth|docker)'
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether e2:fa:99:0d:3b:05 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic eth0
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether c6:da:44:d9:2e:15 brd ff:ff:ff:ff:ff:ff
inet 10.201.12.179/24 brd 10.201.12.255 scope global eth1
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:30:71:67:92 brd ff:ff:ff:ff:ff:ff
inet 172.172.172.2/24 brd 172.172.172.255 scope global docker0
root@k8s-hv06:~# kubectl logs kube-flannel-ds-amd64-cqw5p -n kube-system -c kube-flannel
I0129 13:02:09.244309 1 main.go:488] Using interface with name eth1 and address 10.201.12.179
I0129 13:02:09.244498 1 main.go:505] Defaulting external address to interface address (10.201.12.179)
E0129 13:02:09.246907 1 main.go:232] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-amd64-cqw5p': Get https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-amd64-cqw5p: dial tcp 10.96.0.1:443: getsockopt: connection refused
root@k8s-hv09:~# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
64a9b21607cb quay.io/coreos/flannel "cp -f /etc/kube-fla…" 23 minutes ago Exited (0) 23 minutes ago k8s_install-cni_kube-flannel-ds-amd64-4k2dt_kube-system_b8f510e3-23c7-11e9-85a5-1a05eef25a13_0
2e0145137449 f0fad859c909 "/opt/bin/flanneld -…" About a minute ago Exited (1) About a minute ago k8s_kube-flannel_kube-flannel-ds-amd64-4k2dt_kube-system_b8f510e3-23c7-11e9-85a5-1a05eef25a13_9
90271ee02f68 k8s.gcr.io/kube-proxy "/usr/local/bin/kube…" 23 minutes ago Up 23 minutes k8s_kube-proxy_kube-proxy-6zgjq_kube-system_b8f50ef6-23c7-11e9-85a5-1a05eef25a13_0
b6345e9d8087 k8s.gcr.io/pause:3.1 "/pause" 23 minutes ago Up 23 minutes k8s_POD_kube-proxy-6zgjq_kube-system_b8f50ef6-23c7-11e9-85a5-1a05eef25a13_0
dca408f8a807 k8s.gcr.io/pause:3.1 "/pause" 23 minutes ago Up 23 minutes k8s_POD_kube-flannel-ds-amd64-4k2dt_kube-system_b8f510e3-23c7-11e9-85a5-1a05eef25a13_0
我看到命令/opt/bin/flanneld --iface=eth1 --ip-masq --kube-subnet-mgr
在工作节点上运行,但在容器k8s_install-cni_kube-flannel-ds-ad64停止后终止。文件CCD_ 6和目录CCD_。
我不明白原因。如果我将新的主节点添加到集群中,它将正常工作。
root@k8s-hv06:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-hv01 Ready master 17s v1.13.2
k8s-hv06 Ready master 22m v1.13.2
k8s-hv09 Ready <none> 6m22s v1.13.2
root@k8s-hv06:~# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-86c58d9df4-b8th2 1/1 Running 0 23m
kube-system coredns-86c58d9df4-hmm8q 1/1 Running 0 23m
kube-system kube-apiserver-k8s-hv01 1/1 Running 0 2m16s
kube-system kube-apiserver-k8s-hv06 1/1 Running 0 23m
kube-system kube-controller-manager-k8s-hv01 1/1 Running 0 2m16s
kube-system kube-controller-manager-k8s-hv06 1/1 Running 0 23m
kube-system kube-flannel-ds-amd64-92kmc 0/1 CrashLoopBackOff 6 8m20s
kube-system kube-flannel-ds-amd64-krdgt 1/1 Running 0 2m16s
kube-system kube-flannel-ds-amd64-lpgkt 1/1 Running 0 10m
kube-system kube-proxy-7ck7f 1/1 Running 0 23m
kube-system kube-proxy-nbkvg 1/1 Running 0 8m20s
kube-system kube-proxy-nvbcw 1/1 Running 0 2m16s
kube-system kube-scheduler-k8s-hv01 1/1 Running 0 2m16s
kube-system kube-scheduler-k8s-hv06 1/1 Running 0 23m
但不是工作节点。
更新:
我与api服务器的连接没有问题。我的问题是两个假设(cni
,flannel
)。如果没有这些iface,我就无法在主节点和工作节点之间进行同步。好的,取一个额外的节点并将其添加到集群中。如果我在配置文件中使用kubeadm-init
,一切都会正常工作。法兰绒插件的Ifaces是礼物。让kubeadm reset
**和kubeadm join
成为同一集群的这个节点。缺少网络接口。但为什么呢?在这两种情况下,我们都有相同的方法从master api获取网络配置数据。如果我发现了任何错误或警告,我就会有线索了。
** kubectl delete node <node name>(on master)
kubeadm reset && docker system prune -a && reboot
已修复。API服务器绑定到eth0而不是eth1。这是我的错误。我很尴尬。
额外的主节点工作良好,因为它会检查自己的apiserver iface。但这对工作节点不起作用。
/关闭
您的案例也存在类似的github问题,通过在主机上手动编辑/etc/kubernetes/manifests/kube-apiserver.yaml
并更改liveness probe:
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 443 # was 6443
scheme: HTTPS
我希望它能帮助