我的kubernetes K3s集群出现以下错误:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 17m default-scheduler 0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate.
Warning FailedScheduling 17m default-scheduler 0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate.
为了列出我执行的集群中的污点:
kubectl get nodes -o json | jq '.items[].spec'
输出:
{
"podCIDR": "10.42.0.0/24",
"podCIDRs": [
"10.42.0.0/24"
],
"providerID": "k3s://antonis-dell",
"taints": [
{
"effect": "NoSchedule",
"key": "node.kubernetes.io/disk-pressure",
"timeAdded": "2021-12-17T10:54:31Z"
}
]
}
{
"podCIDR": "10.42.1.0/24",
"podCIDRs": [
"10.42.1.0/24"
],
"providerID": "k3s://knodea"
}
当我使用kubectl describe node antonis-dell
时,我得到:
Name: antonis-dell
Roles: control-plane,master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=k3s
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=antonis-dell
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=true
node-role.kubernetes.io/master=true
node.kubernetes.io/instance-type=k3s
Annotations: csi.volume.kubernetes.io/nodeid: {"ch.ctrox.csi.s3-driver":"antonis-dell"}
flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"f2:d5:6c:6a:85:0a"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.1.XX
k3s.io/hostname: antonis-dell
k3s.io/internal-ip: 192.168.1.XX
k3s.io/node-args: ["server"]
k3s.io/node-config-hash: YANNMDBIL7QEFSZANHGVW3PXY743NWWRVFKBKZ4FXLV5DM4C74WQ====
k3s.io/node-env:
{"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/e61cd97f31a54dbcd9893f8325b7133cfdfd0229ff3bfae5a4f845780a93e84c","K3S_KUBECONFIG_MODE":"644"}
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 17 Dec 2021 12:11:39 +0200
Taints: node.kubernetes.io/disk-pressure:NoSchedule
节点似乎有磁盘压力污点。
这个命令不起作用:kubectl taint node antonis-dell node.kubernetes.io/disk-pressure:NoSchedule-
,在我看来,即使它起作用,这也不是一个好的解决方案,因为控制平面分配了污点。
此外,在命令kubectl describe node antonis-dell
结束时,我观察到:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FreeDiskSpaceFailed 57m kubelet failed to garbage collect required amount of images. Wanted to free 32967806976 bytes, but freed 0 bytes
Warning FreeDiskSpaceFailed 52m kubelet failed to garbage collect required amount of images. Wanted to free 32500092928 bytes, but freed 0 bytes
Warning FreeDiskSpaceFailed 47m kubelet failed to garbage collect required amount of images. Wanted to free 32190205952 bytes, but freed 0 bytes
Warning FreeDiskSpaceFailed 42m kubelet failed to garbage collect required amount of images. Wanted to free 32196628480 bytes, but freed 0 bytes
Warning FreeDiskSpaceFailed 37m kubelet failed to garbage collect required amount of images. Wanted to free 32190926848 bytes, but freed 0 bytes
Warning FreeDiskSpaceFailed 2m21s (x7 over 32m) kubelet (combined from similar events): failed to garbage collect required amount of images. Wanted to free 30909374464 bytes, but freed 0 bytes
也许磁盘压力与此有关?如何删除不需要的图像?
将答案发布为社区wiki,可以随意编辑和扩展。
node.kubernetes.io/disk-pressure:NoSchedule
污点表示发生了一些磁盘压力(称为(。
kubelet
基于在节点上观察到的imagefs.available
、imagefs.inodesFree
、nodefs.available
和nodefs.inodesFree
(仅限Linux(来检测磁盘压力。然后将观察到的值与可以在kubelet
上设置的相应阈值进行比较,以确定是否应该添加/移除节点条件和污点。
有关disk-pressure
的更多详细信息,请参阅How Does Kubelet Decide that Resources Are Low?
部分下的Kubernetes中的高效节点资源不足管理:
memory.available
——描述集群状态的信号记忆力内存的默认逐出阈值为100 Mi。In换句话说,当内存耗尽时,kubelet开始驱逐Pods低至100 Mi.
nodefs.available
—nodefs是由卷、守护进程日志等的kubelet。默认情况下,kubelet如果nodefs.available<10%。
nodefs.inodesFree
——描述节点状态的信号inode内存。默认情况下,如果nodefs.inodesFree<5%。
imagefs.available
—imagefs文件系统容器运行时用来存储容器的可选文件系统图像和容器可写层。默认情况下,kubelet启动如果imagefs.available<15%。
imagefs.inodesFree
—映像索引节点内存的状态。它没有默认的驱逐阈值。
检查什么
有不同的东西可以帮助,例如:
-
修剪未使用的对象,如图像(使用Docker CRI(-修剪图像。
docker图像修剪命令允许您清理未使用的图像。默认情况下,docker图像修剪仅清理悬挂的图像。悬挂图像是指未标记且未被任何容器引用的图像。
-
如果节点上的文件/日志占用大量空间,请检查它们。
-
磁盘空间被消耗的任何其他原因。