1 个节点有污点 {node.kubernetes.io/disk-pressure: },Pod 不能容忍



我的kubernetes K3s集群出现以下错误:

Events:
Type     Reason            Age   From               Message
----     ------            ----  ----               -------
Warning  FailedScheduling  17m   default-scheduler  0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate.
Warning  FailedScheduling  17m   default-scheduler  0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate.

为了列出我执行的集群中的污点:

kubectl get nodes -o json | jq '.items[].spec'

输出:

{
"podCIDR": "10.42.0.0/24",
"podCIDRs": [
"10.42.0.0/24"
],
"providerID": "k3s://antonis-dell",
"taints": [
{
"effect": "NoSchedule",
"key": "node.kubernetes.io/disk-pressure",
"timeAdded": "2021-12-17T10:54:31Z"
}
]
}
{
"podCIDR": "10.42.1.0/24",
"podCIDRs": [
"10.42.1.0/24"
],
"providerID": "k3s://knodea"
}

当我使用kubectl describe node antonis-dell时,我得到:

Name:               antonis-dell
Roles:              control-plane,master
Labels:             beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=k3s
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=antonis-dell
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=true
node-role.kubernetes.io/master=true
node.kubernetes.io/instance-type=k3s
Annotations:        csi.volume.kubernetes.io/nodeid: {"ch.ctrox.csi.s3-driver":"antonis-dell"}
flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"f2:d5:6c:6a:85:0a"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.1.XX
k3s.io/hostname: antonis-dell
k3s.io/internal-ip: 192.168.1.XX
k3s.io/node-args: ["server"]
k3s.io/node-config-hash: YANNMDBIL7QEFSZANHGVW3PXY743NWWRVFKBKZ4FXLV5DM4C74WQ====
k3s.io/node-env:
{"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/e61cd97f31a54dbcd9893f8325b7133cfdfd0229ff3bfae5a4f845780a93e84c","K3S_KUBECONFIG_MODE":"644"}
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 17 Dec 2021 12:11:39 +0200
Taints:             node.kubernetes.io/disk-pressure:NoSchedule

节点似乎有磁盘压力污点。

这个命令不起作用:kubectl taint node antonis-dell node.kubernetes.io/disk-pressure:NoSchedule-,在我看来,即使它起作用,这也不是一个好的解决方案,因为控制平面分配了污点。

此外,在命令kubectl describe node antonis-dell结束时,我观察到:

Events:
Type     Reason               Age                  From     Message
----     ------               ----                 ----     -------
Warning  FreeDiskSpaceFailed  57m                  kubelet  failed to garbage collect required amount of images. Wanted to free 32967806976 bytes, but freed 0 bytes
Warning  FreeDiskSpaceFailed  52m                  kubelet  failed to garbage collect required amount of images. Wanted to free 32500092928 bytes, but freed 0 bytes
Warning  FreeDiskSpaceFailed  47m                  kubelet  failed to garbage collect required amount of images. Wanted to free 32190205952 bytes, but freed 0 bytes
Warning  FreeDiskSpaceFailed  42m                  kubelet  failed to garbage collect required amount of images. Wanted to free 32196628480 bytes, but freed 0 bytes
Warning  FreeDiskSpaceFailed  37m                  kubelet  failed to garbage collect required amount of images. Wanted to free 32190926848 bytes, but freed 0 bytes
Warning  FreeDiskSpaceFailed  2m21s (x7 over 32m)  kubelet  (combined from similar events): failed to garbage collect required amount of images. Wanted to free 30909374464 bytes, but freed 0 bytes

也许磁盘压力与此有关?如何删除不需要的图像?

将答案发布为社区wiki,可以随意编辑和扩展。


node.kubernetes.io/disk-pressure:NoSchedule污点表示发生了一些磁盘压力(称为(。

kubelet基于在节点上观察到的imagefs.availableimagefs.inodesFreenodefs.availablenodefs.inodesFree(仅限Linux(来检测磁盘压力。然后将观察到的值与可以在kubelet上设置的相应阈值进行比较,以确定是否应该添加/移除节点条件和污点。

有关disk-pressure的更多详细信息,请参阅How Does Kubelet Decide that Resources Are Low?部分下的Kubernetes中的高效节点资源不足管理:

memory.available——描述集群状态的信号记忆力内存的默认逐出阈值为100 Mi。In换句话说,当内存耗尽时,kubelet开始驱逐Pods低至100 Mi.

nodefs.available—nodefs是由卷、守护进程日志等的kubelet。默认情况下,kubelet如果nodefs.available<10%。

nodefs.inodesFree——描述节点状态的信号inode内存。默认情况下,如果nodefs.inodesFree<5%。

imagefs.available—imagefs文件系统容器运行时用来存储容器的可选文件系统图像和容器可写层。默认情况下,kubelet启动如果imagefs.available<15%。

imagefs.inodesFree—映像索引节点内存的状态。它没有默认的驱逐阈值。


检查什么

有不同的东西可以帮助,例如:

  • 修剪未使用的对象,如图像(使用Docker CRI(-修剪图像。

    docker图像修剪命令允许您清理未使用的图像。默认情况下,docker图像修剪仅清理悬挂的图像。悬挂图像是指未标记且未被任何容器引用的图像。

  • 如果节点上的文件/日志占用大量空间,请检查它们。

  • 磁盘空间被消耗的任何其他原因。