rke --debug up --config cluster.yml
在etcd主机上进行健康检查失败,并出现错误:
DEBU[0281][etcd]无法检查etcd主机[x.x.x.x]的运行状况:无法获取主机[x.x.x.x]:获取"https://x.x.x.x:2379/health"远程错误:tls:错误的证书
检查etcd健康检查
for endpoint in $(docker exec etcd /bin/sh -c "etcdctl member list | cut -d, -f5"); do
echo "Validating connection to ${endpoint}/health";
curl -w "n" --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) "${endpoint}/health";
done
Running on that master node
Validating connection to https://x.x.x.x:2379/health
{"health":"true"}
Validating connection to https://x.x.x.x:2379/health
{"health":"true"}
Validating connection to https://x.x.x.x:2379/health
{"health":"true"}
Validating connection to https://x.x.x.x:2379/health
{"health":"true"}
you can run it manually and see if it responds correctly
curl -w "n" --cacert /etc/kubernetes/ssl/kube-ca.pem --cert /etc/kubernetes/ssl/kube-etcd-x-x-x-x.pem --key /etc/kubernetes/ssl/kube-etcd-x-x-x-x-key.pem https://x.x.x.x:2379/health
检查我的自签名证书散列
# md5sum /etc/kubernetes/ssl/kube-ca.pem
f5b358e771f8ae8495c703d09578eb3b /etc/kubernetes/ssl/kube-ca.pem
# for key in $(cat /home/kube/cluster.rkestate | jq -r '.desiredState.certificatesBundle | keys[]'); do echo $(cat /home/kube/cluster.rkestate | jq -r --arg key $key '.desiredState.certificatesBundle[$key].certificatePEM' | sed '$ d' | md5sum) $key; done | grep kube-ca
f5b358e771f8ae8495c703d09578eb3b - kube-ca
versions on my master node
Debian GNU/Linux 10
rke version v1.3.1
docker version Version: 20.10.8
kubectl v1.21.5
v1.21.5-rancher1-1
我觉得我的cluster.rkestate
坏了,还有其他地方可以用rke工具检查证书吗?目前,我无法对此生产集群执行任何操作,并且希望避免停机。我在不同的场景中测试了集群,作为最后的手段,我可以从头开始重新创建集群,但也许我仍然可以修复它。。。rke remove
&;rke up
rke util get-state-file
帮助我重建了坏的cluster.rkestate文件并且我能够成功地CCD_ 6并添加新的主节点来解决整个情况。
问题可以通过以下步骤解决:
-
删除运行
rke up
命令的kube_config_cluster.yml
文件。(由于K8s节点中缺少一些数据( -
删除
cluster.rkestate
文件。 -
重新运行
rke up
命令。