表示未封锁aks节点



我们使用的是启用AzureCNI类型的1.22.6版本的私有集群。并且我们的节点池使用如下的镜像版本。

Name       NodeImageVersion
---------  ---------------------------------------
devpool   AKSUbuntu-1804gen2containerd-2022.02.07
system     AKSUbuntu-1804gen2containerd-2022.02.07

因此,我们引入了kured安装,以定期的方式管理节点映像更新,并按照文档安装组件。https://anchortagdev.com/schedule-azure-kubernetes-service-aks-cluster-updates-with-kured/

确认所有的组件都已创建,并且所有节点上已保存的daemonset pod正在运行。

下面是kubect edit命令为保存的守护进程

获取的yaml。
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "6"
name: kured
namespace: kube-system
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
name: kured
template:
metadata:
labels:
name: kured
spec:
containers:
- command:
- /usr/bin/kured
- --period=1m
- --start-time=10am
- --end-time=1pm
- --time-zone=Local
- --ds-name=kured
- --ds-namespace=kube-system
- --reboot-days=mon
env:
- name: KURED_NODE_ID
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: docker.io/weaveworks/kured:master-f6e4062
imagePullPolicy: IfNotPresent
name: kured
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
hostPID: true
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: kured
serviceAccountName: kured
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
updateStrategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate

但是,即使我们将周期设置为1分钟,节点仍然显示为"准备调度";我们期望库自动封锁它,因为它有过时的映像,并验证所有这些节点中都存在/var/run/reboot所需的文件。

下面是dedemon set pods的输出。

time="2022-06-24T09:24:12Z" level=info msg="Kubernetes Reboot Daemon: master-f6e4062"
time="2022-06-24T09:24:12Z" level=info msg="Node ID: aks-devpool-1xxxxxxx-vmssxxxxxxx"
time="2022-06-24T09:24:12Z" level=info msg="Lock Annotation: kube-system/kured:weave.works/kured-node-lock"
time="2022-06-24T09:24:12Z" level=info msg="Reboot Sentinel: /var/run/reboot-required every 1m0s"
time="2022-06-24T09:24:12Z" level=info msg="Blocking Pod Selectors: []"
time="2022-06-24T09:24:12Z" level=info msg="Reboot on: ---Mon--------------- between 10:00 and 13:00 UTC"

有一个类似的问题,这个问题是通过使用az cli命令更新pod上的图像来解决的。然后他们就恢复正常了……使用:

az aks nodepool upgrade 
--resource-group myResourceGroup 
--cluster-name myAKSCluster 
--name mynodepool 
--node-image-only

https://learn.microsoft.com/en-us/azure/aks/node-image-upgrade upgrade-a-specific-node-pool

不确定在1.22升级后AKS方面是否有更新,但这与我们的集群升级有关。

最新更新