大家好,
我正在努力在AKS集群上激活HugePage。
- 我注意到我首先必须配置一个支持HugePage的节点池。
- 唯一官方的Azure Hugepage文档是关于透明HugePage (https://learn.microsoft.com/en-us/azure/aks/custom-node-configuration),但我不知道它是否足够…
- 然后我知道我也必须配置pod
- 我想依靠这个(https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/),但作为2)不工作…
尽管我做了那么多,我还是做不到。
如果我遵循Microsoft文档,我的nodepool生成如下:
"kubeletConfig": {
"allowedUnsafeSysctls": null,
"cpuCfsQuota": null,
"cpuCfsQuotaPeriod": null,
"cpuManagerPolicy": null,
"failSwapOn": false,
"imageGcHighThreshold": null,
"imageGcLowThreshold": null,
"topologyManagerPolicy": null
},
"linuxOsConfig": {
"swapFileSizeMb": null,
"sysctls": {
"fsAioMaxNr": null,
"fsFileMax": null,
"fsInotifyMaxUserWatches": null,
"fsNrOpen": null,
"kernelThreadsMax": null,
"netCoreNetdevMaxBacklog": null,
"netCoreOptmemMax": null,
"netCoreRmemMax": null,
"netCoreSomaxconn": null,
"netCoreWmemMax": null,
"netIpv4IpLocalPortRange": "32000 60000",
"netIpv4NeighDefaultGcThresh1": null,
"netIpv4NeighDefaultGcThresh2": null,
"netIpv4NeighDefaultGcThresh3": null,
"netIpv4TcpFinTimeout": null,
"netIpv4TcpKeepaliveProbes": null,
"netIpv4TcpKeepaliveTime": null,
"netIpv4TcpMaxSynBacklog": null,
"netIpv4TcpMaxTwBuckets": null,
"netIpv4TcpRmem": null,
"netIpv4TcpTwReuse": null,
"netIpv4TcpWmem": null,
"netIpv4TcpkeepaliveIntvl": null,
"netNetfilterNfConntrackBuckets": null,
"netNetfilterNfConntrackMax": null,
"vmMaxMapCount": null,
"vmSwappiness": null,
"vmVfsCachePressure": null
},
"transparentHugePageDefrag": "defer+madvise",
"transparentHugePageEnabled": "madvise"
我的节点还是这样:
# kubectl describe nodes aks-deadpoolhp-31863567-vmss000000|grep hugepage
Capacity:
attachable-volumes-azure-disk: 16
cpu: 8
ephemeral-storage: 129901008Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32940620Ki
pods: 110
Allocatable:
attachable-volumes-azure-disk: 16
cpu: 7820m
ephemeral-storage: 119716768775
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 28440140Ki
pods: 110
我的kube版本是1.16.15
我还看到我应该启用像--feature-gates=HugePages=true
这样的功能门(https://dev.to/dannypsnl/hugepages-on-kubernetes-5e7p),但我不知道如何在AKS中实现…无论如何,由于我的节点没有显示任何HugePage可用性,我不确定它现在是否有用。
我甚至尝试re使用--kubeconfig
创建aks集群,但一切保持不变:i不能使用HugePage…
我再次需要你的帮助,我完全迷失在这个AKS服务中…
- 在笔记本电脑上安装kubectl-node-shell
curl -LO https://github.com/kvaps/kubectl-node-shell/raw/master/kubectl-node_shell
chmod +x ./kubectl-node_shell
sudo mv ./kubectl-node_shell /usr/local/bin/kubectl-node_shell
- 获取您想要进入的节点:
kubectl get pod <YOUR_POD> -o custom-columns=CONTAINER:.spec.nodeName -n <YOUR_NAMESPACE>
- 如果节点NONE,这意味着您的pod处于挂起状态。选择一个随机节点:
kubectl get pod -n <YOUR_NAMESPACE>
- 进入你的节点:
kubectl node-shell <NODE>
- 配置Hugepage:
mkdir -p /mnt/huge
mount -t hugetlbfs nodev /mnt/huge
echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
cat /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
- 重新启动kubelet(仍然在节点中,是):
systemctl restart kubelet
- 使用C-d (Ctrl + d)退出node-shell
- 检查HugePage是否ON(即:值不能为0)
kubectl describe node <NODE>|grep -i -e "capacity" -e "allocatable" -e "huge"
- 检查你的pod不再处于挂起状态,或者启动你的helm install/kubectl apply now!