问题
我正在尝试对以下消息进行故障排除time="<timestamp>" level=error msg="records retrieval failed: failed to list hosted zones: WebIdentityErr: failed to retrieve credentialsncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentityntstatus code: 403, request id: <uuid>"
我从运行kubectl logs external-dns-xxxxxxxxxx-xxxxx
中得到的
我的问题
我想弄清楚。。。
- 这个消息是从哪里生成的?我不知道是来自我的服务、serviceaccount、clusterrole、clusterrolebinding、pod还是其他什么。如有任何澄清或链接到有用的解释,我们将不胜感激。(我现在的猜测来自pod,基于k8s文档,但我仍然不确定,我不确定如何尝试追踪它来确认(
- 为什么我明确指定的IAM权限没有由我的外部dns承担?任何关于我的外部dns pod如何尝试进行其权限假设或执行其任务的流程的解释,都将不胜感激
我的目标
我对K8s还很陌生,正在尝试部署一个带有外部dns的EKS集群,以实现对Route53记录的自动化管理。
到目前为止我尝试了什么
- 我在扩展IAM权限时搞砸了,并尽可能广泛地打开了它们
- 我已经在定义
eks.amazonaws.com/role-arn
的所有资源中明确添加了注释 - 我已经尝试过将
external-dns
部署从kube-system
移动到default
命名空间,因为在GitHub问题上建议使用相同的错误消息
部署详细信息
我使用Terraform来部署我的大部分EKS集群、节点组、OIDC和;赫尔姆
目前,我选择只分享部署的结果,而不是配置,以便尽量减少这个问题的规模。如果你想看配置文件,只要问一下,我就会分享我所拥有的一切。
Kuectl描述
kubectl describe service external-dns
Name: external-dns
Namespace: default
Labels: app.kubernetes.io/instance=external-dns
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=external-dns
helm.sh/chart=external-dns-6.9.0
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
meta.helm.sh/release-name: external-dns
meta.helm.sh/release-namespace: default
Selector: app.kubernetes.io/instance=external-dns,app.kubernetes.io/name=external-dns
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 172.20.233.113
IPs: 172.20.233.113
Port: http 7979/TCP
TargetPort: http/TCP
Endpoints: 10.12.13.93:7979
Session Affinity: None
Events: <none>
kubectl describe serviceaccount external-dns
Name: external-dns
Namespace: default
Labels: app.kubernetes.io/managed-by=Helm
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
meta.helm.sh/release-name: external-dns
meta.helm.sh/release-namespace: default
Image pull secrets: <none>
Mountable secrets: external-dns-token-twgpb
Tokens: external-dns-token-twgpb
Events: <none>
kubectl describe clusterrole external-dns
Name: external-dns
Labels: <none>
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
endpoints [] [] [get watch list]
nodes [] [] [get watch list]
pods [] [] [get watch list]
services [] [] [get watch list]
ingresses.extensions [] [] [get watch list]
gateways.networking.istio.io [] [] [get watch list]
ingresses.networking.k8s.io [] [] [get watch list]
kubectl describe clusterrolebindings.rbac.authorization.k8s.io external-dns
Name: external-dns
Labels: <none>
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
Role:
Kind: ClusterRole
Name: external-dns
Subjects:
Kind Name Namespace
---- ---- ---------
ServiceAccount external-dns default
kubectl describe ingress -n kube-system
Name: aws-lb-ctrlr
Labels: <none>
Namespace: kube-system
Address:
Ingress Class: <none>
Default backend: <default>
Rules:
Host Path Backends
---- ---- --------
*
/* aws-load-balancer-controller:80 (<error: endpoints "aws-load-balancer-controller" not found>)
Annotations: alb.ingress.kubernetes.io/inbound-cidrs: 0.0.0.0/0
alb.ingress.kubernetes.io/listen-ports: [{'HTTP': 80}]
alb.ingress.kubernetes.io/scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: <my-domain.tld>
kubernetes.io/ingress.class: alb
Events: <none>
kubectl describe pod
Name: external-dns-xxxxxxxxxx-xxxxx
Namespace: default
Priority: 0
Service Account: external-dns
Node: ip-10-12-13-107.ec2.internal/10.12.13.107
Start Time: Tue, 20 Sep 2022 10:48:06 -0400
Labels: app.kubernetes.io/instance=external-dns
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=external-dns
helm.sh/chart=external-dns-6.9.0
pod-template-hash=xxxxxxxxxx
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 10.12.13.93
IPs:
IP: 10.12.13.93
Controlled By: ReplicaSet/external-dns-xxxxxxxxxx
Containers:
external-dns:
Container ID: docker://5b49f49f7b9c0be8cb00835f117eedccaff3d5bb4ebfecb4bc6af771d2b3d336
Image: docker.io/bitnami/external-dns:0.12.2-debian-11-r14
Image ID: docker-pullable://bitnami/external-dns@sha256:195dec0f60c9137952ea0604623c7eb001ece4142916bdfb0cc79f5d9cdc4b62
Port: 7979/TCP
Host Port: 0/TCP
Args:
--metrics-address=:7979
--log-level=debug
--log-format=text
--domain-filter=<my-domain.tld>
--policy=sync
--provider=aws
--registry=txt
--interval=1m
--txt-owner-id=<hosted-zone-id>
--source=service
--source=ingress
--aws-api-retries=3
--aws-zone-type=public
--aws-assume-role=arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
--aws-batch-change-size=1000
State: Running
Started: Tue, 20 Sep 2022 10:48:13 -0400
Ready: True
Restart Count: 0
Liveness: http-get http://:http/healthz delay=10s timeout=5s period=10s #success=1 #failure=2
Readiness: http-get http://:http/healthz delay=5s timeout=5s period=10s #success=1 #failure=6
Environment:
AWS_DEFAULT_REGION: us-east-1
AWS_STS_REGIONAL_ENDPOINTS: regional
AWS_ROLE_ARN: arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d82r7 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
aws-iam-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 86400
kube-api-access-d82r7:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m44s default-scheduler Successfully assigned default/external-dns-xxxxxxxxxx-xxxxx to ip-10-12-13-107.ec2.internal
Normal Pulling 3m43s kubelet Pulling image "docker.io/bitnami/external-dns:0.12.2-debian-11-r14"
Normal Pulled 3m40s kubelet Successfully pulled image "docker.io/bitnami/external-dns:0.12.2-debian-11-r14" in 3.588418583s
Normal Created 3m38s kubelet Created container external-dns
Normal Started 3m37s kubelet Started container external-dns
kubectl describe deployments.apps
Name: external-dns
Namespace: default
CreationTimestamp: Tue, 20 Sep 2022 10:48:06 -0400
Labels: app.kubernetes.io/instance=external-dns
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=external-dns
helm.sh/chart=external-dns-6.9.0
Annotations: deployment.kubernetes.io/revision: 1
meta.helm.sh/release-name: external-dns
meta.helm.sh/release-namespace: default
Selector: app.kubernetes.io/instance=external-dns,app.kubernetes.io/name=external-dns
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app.kubernetes.io/instance=external-dns
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=external-dns
helm.sh/chart=external-dns-6.9.0
Service Account: external-dns
Containers:
external-dns:
Image: docker.io/bitnami/external-dns:0.12.2-debian-11-r14
Port: 7979/TCP
Host Port: 0/TCP
Args:
--metrics-address=:7979
--log-level=debug
--log-format=text
--domain-filter=<my-domain.tld>
--policy=sync
--provider=aws
--registry=txt
--interval=1m
--txt-owner-id=<hosted-zone-id>
--source=service
--source=ingress
--aws-api-retries=3
--aws-zone-type=public
--aws-assume-role=arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
--aws-batch-change-size=1000
Liveness: http-get http://:http/healthz delay=10s timeout=5s period=10s #success=1 #failure=2
Readiness: http-get http://:http/healthz delay=5s timeout=5s period=10s #success=1 #failure=6
Environment:
AWS_DEFAULT_REGION: us-east-1
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: external-dns-xxxxxxxxxx (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 9m30s deployment-controller Scaled up replica set external-dns-xxxxxxxxxx to 1
AWS IAM(AllowExternalDNSUpdates(
IAM角色(信任关系(
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<userid>:oidc-provider/oidc.eks.region-code.amazonaws.com/id/<oidc-id>"
},
"Action": "sts:AssumeRoleWithWebIdentity"
}
]
}
IAM策略(权限(
{
"Statement": [
{
"Action": "route53:ChangeResourceRecordSets",
"Effect": "Allow",
"Resource": "arn:aws:route53:::hostedzone/*",
"Sid": ""
},
{
"Action": [
"route53:ListResourceRecordSets",
"route53:ListHostedZones"
],
"Effect": "Allow",
"Resource": "*",
"Sid": ""
}
],
"Version": "2012-10-17"
}
答案
所以基本上有两件事
- (来源于@Jordam,在评论中(信任关系不正确,我编辑了帖子以修复它,并重新运行了我的配置。然后我的问题变成了
records retrieval failed: failed to list hosted zones: AccessDenied: User: arn:aws:sts::<userid>:assumed-role/AllowExternalDNSUpdates/1663776911448118272 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::<userid>:role/AllowExternalDNSUpdatesntstatus
- 因为我有一个额外的错误,我不得不回去修复我的地形头盔配置,并删除";承担角色";背景基本上,如果你有第二个错误;承担角色试图承担角色";那么你只是承担了两次这个角色