我在K8s集群中有一个python脚本,它每隔几分钟作为K8s Cronjob运行一次。该脚本检查群集中的节点,如果某个节点不正常超过5分钟,则终止该节点。要连接到AWS,我使用Boto3。requirement.txt
boto3==1.16.11
botocore==1.19.11
并且权限以pod注释的形式传递。
Annotations: iam.amazonaws.com/role: arn:aws:iam::123456789:role/k8s-nodes-monitoring-role
IAM角色具有arn:aws:iam::aws:policy/AmazonEC2FullAccess
策略和有效的信任策略。
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
},
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789:role/nodes.my-domain.com"
},
"Action": "sts:AssumeRole"
}
]
}
我面临的问题是,在某些情况下,脚本会抛出NoCredentialsError('Unable to locate credentials')
错误。这种行为是不一致的,因为在大多数情况下,脚本已经成功终止了不健康的节点,我可以将其与AWS CloudTrail事件进行交叉检查。我可以在kub2iam日志中看到,Get请求接收到200,而Put请求接收到403。
ime="2020-12-21T12:50:16Z" level=info msg="GET /latest/meta-data/iam/security-
credentials/k8s-nodes-monitoring-role (200) took 47918.000000 ns" req.method=GET
req.path=/latest/meta-data/iam/security-credentials/k8s-nodes-monitoring-role
req.remote=100.116.203.13 res.duration=47918 res.status=200
time="2020-12-21T12:52:16Z" level=info msg="PUT /latest/api/token (403) took 19352999.000000
ns" req.method=PUT req.path=/latest/api/token req.remote=100.116.203.14
res.duration=1.9352999e+07 res.status=40
任何关于如何调试的帮助或想法都将不胜感激。
我不知道kube2iam的详细信息,但也许你应该切换到一种称为IRSA(IAM服务帐户角色(的AWS原生方式。你可以在这篇博客文章中找到所有必要的信息:https://aws.amazon.com/de/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/