无法在AKS节点上激活HugePage



大家好,

我正在努力在AKS集群上激活HugePage。

  1. 我注意到我首先必须配置一个支持HugePage的节点池。
    • 唯一官方的Azure Hugepage文档是关于透明HugePage (https://learn.microsoft.com/en-us/azure/aks/custom-node-configuration),但我不知道它是否足够…
  2. 然后我知道我也必须配置pod
    • 我想依靠这个(https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/),但作为2)不工作…

尽管我做了那么多,我还是做不到。

如果我遵循Microsoft文档,我的nodepool生成如下:

"kubeletConfig": {
"allowedUnsafeSysctls": null,
"cpuCfsQuota": null,
"cpuCfsQuotaPeriod": null,
"cpuManagerPolicy": null,
"failSwapOn": false,
"imageGcHighThreshold": null,
"imageGcLowThreshold": null,
"topologyManagerPolicy": null
},
"linuxOsConfig": {
"swapFileSizeMb": null,
"sysctls": {
"fsAioMaxNr": null,
"fsFileMax": null,
"fsInotifyMaxUserWatches": null,
"fsNrOpen": null,
"kernelThreadsMax": null,
"netCoreNetdevMaxBacklog": null,
"netCoreOptmemMax": null,
"netCoreRmemMax": null,
"netCoreSomaxconn": null,
"netCoreWmemMax": null,
"netIpv4IpLocalPortRange": "32000 60000",
"netIpv4NeighDefaultGcThresh1": null,
"netIpv4NeighDefaultGcThresh2": null,
"netIpv4NeighDefaultGcThresh3": null,
"netIpv4TcpFinTimeout": null,
"netIpv4TcpKeepaliveProbes": null,
"netIpv4TcpKeepaliveTime": null,
"netIpv4TcpMaxSynBacklog": null,
"netIpv4TcpMaxTwBuckets": null,
"netIpv4TcpRmem": null,
"netIpv4TcpTwReuse": null,
"netIpv4TcpWmem": null,
"netIpv4TcpkeepaliveIntvl": null,
"netNetfilterNfConntrackBuckets": null,
"netNetfilterNfConntrackMax": null,
"vmMaxMapCount": null,
"vmSwappiness": null,
"vmVfsCachePressure": null
},
"transparentHugePageDefrag": "defer+madvise",
"transparentHugePageEnabled": "madvise"

我的节点还是这样:

# kubectl describe nodes aks-deadpoolhp-31863567-vmss000000|grep hugepage
Capacity:
attachable-volumes-azure-disk:  16
cpu:                            8
ephemeral-storage:              129901008Ki
hugepages-1Gi:                  0
hugepages-2Mi:                  0
memory:                         32940620Ki
pods:                           110
Allocatable:
attachable-volumes-azure-disk:  16
cpu:                            7820m
ephemeral-storage:              119716768775
hugepages-1Gi:                  0
hugepages-2Mi:                  0
memory:                         28440140Ki
pods:                           110

我的kube版本是1.16.15

我还看到我应该启用像--feature-gates=HugePages=true这样的功能门(https://dev.to/dannypsnl/hugepages-on-kubernetes-5e7p),但我不知道如何在AKS中实现…无论如何,由于我的节点没有显示任何HugePage可用性,我不确定它现在是否有用。

我甚至尝试re使用--kubeconfig创建aks集群,但一切保持不变:i不能使用HugePage…

我再次需要你的帮助,我完全迷失在这个AKS服务中…

  • 在笔记本电脑上安装kubectl-node-shell
curl -LO https://github.com/kvaps/kubectl-node-shell/raw/master/kubectl-node_shell
chmod +x ./kubectl-node_shell
sudo mv ./kubectl-node_shell /usr/local/bin/kubectl-node_shell
  • 获取您想要进入的节点:
kubectl get pod <YOUR_POD> -o custom-columns=CONTAINER:.spec.nodeName -n <YOUR_NAMESPACE>
  • 如果节点NONE,这意味着您的pod处于挂起状态。选择一个随机节点:
kubectl get pod -n <YOUR_NAMESPACE>
  • 进入你的节点:
kubectl node-shell <NODE>
  • 配置Hugepage:
mkdir -p /mnt/huge
mount -t hugetlbfs nodev /mnt/huge
echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
cat  /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
  • 重新启动kubelet(仍然在节点中,是):
systemctl restart kubelet
  • 使用C-d (Ctrl + d)退出node-shell
  • 检查HugePage是否ON(即:值不能为0)
kubectl describe node <NODE>|grep -i -e "capacity" -e "allocatable" -e "huge"
  • 检查你的pod不再处于挂起状态,或者启动你的helm install/kubectl apply now!

相关内容

  • 没有找到相关文章