指标服务器不适用于工作器节点

我在2 raspberry pi 4中部署了一个k3s集群。一个作为主人，第二个作为工人，使用脚本k3s提供以下选项:

对于主节点:

curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC='server --bind-address 192.168.1.113 (which is the master node ip)' sh -

到代理节点:

curl -sfL https://get.k3s.io | 
K3S_URL=https://192.168.1.113:6443 
K3S_TOKEN=<master-token> 
INSTALL_K3S_EXEC='agent' sh-

一切似乎都工作，但kubectl top nodes返回以下内容:

NAME          CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%     
k3s-master    137m         3%     1285Mi          33%         
k3s-node-01   <unknown>                           <unknown>               <unknown>               <unknown>

我也尝试部署k8s仪表板，根据什么是写在文档中，但它无法工作，因为它无法到达指标服务器，并得到一个超时错误:

"error trying to reach service: dial tcp 10.42.1.11:8443: i/o timeout"

和我在pod日志中看到很多错误:

2021/09/17 09:24:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:25:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:26:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:27:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.

来自metrics-serverpod的日志:

elet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
E0917 14:03:24.767949       1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
E0917 14:04:24.767960       1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host

将其移出注释以获得更好的可视性。

创建小集群后，我无法重现这种行为，metrics-server对两个节点都工作得很好，kubectl top nodes显示了关于两个可用节点的信息和指标(认为开始收集指标需要一些时间)。

这导致故障排除步骤，为什么它不工作。检查metrics-server日志是解决这个问题的最有效方法:

$ kubectl logs metrics-server-58b44df574-2n9dn -n kube-system

根据日志，继续执行的步骤将不同，例如上面的注释:

首先是no route to host，这与网络和缺乏解析主机名的可能性有关
然后是i/o timeout，表示路由存在，但是服务没有响应。这可能是由于防火墙阻止某些端口/源，kubelet未运行(侦听端口10250)或出现在OP时，ntp出现问题，影响证书和连接。
错误在其他情况下可能不同，重要的是找到错误并根据它进一步排除故障。

相关内容

最新更新

热门标签：