对EKS外部Dns IAM进行故障排除



问题

我正在尝试对以下消息进行故障排除
time="<timestamp>" level=error msg="records retrieval failed: failed to list hosted zones: WebIdentityErr: failed to retrieve credentialsncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentityntstatus code: 403, request id: <uuid>"
我从运行kubectl logs external-dns-xxxxxxxxxx-xxxxx中得到的

我的问题

我想弄清楚。。。

  1. 这个消息是从哪里生成的?我不知道是来自我的服务、serviceaccount、clusterrole、clusterrolebinding、pod还是其他什么。如有任何澄清或链接到有用的解释,我们将不胜感激。(我现在的猜测来自pod,基于k8s文档,但我仍然不确定,我不确定如何尝试追踪它来确认(
  2. 为什么我明确指定的IAM权限没有由我的外部dns承担?任何关于我的外部dns pod如何尝试进行其权限假设或执行其任务的流程的解释,都将不胜感激

我的目标

我对K8s还很陌生,正在尝试部署一个带有外部dns的EKS集群,以实现对Route53记录的自动化管理。

到目前为止我尝试了什么

  1. 我在扩展IAM权限时搞砸了,并尽可能广泛地打开了它们
  2. 我已经在定义eks.amazonaws.com/role-arn的所有资源中明确添加了注释
  3. 我已经尝试过将external-dns部署从kube-system移动到default命名空间,因为在GitHub问题上建议使用相同的错误消息

部署详细信息

我使用Terraform来部署我的大部分EKS集群、节点组、OIDC和;赫尔姆
目前,我选择只分享部署的结果,而不是配置,以便尽量减少这个问题的规模。如果你想看配置文件,只要问一下,我就会分享我所拥有的一切。

Kuectl描述

kubectl describe service external-dns

Name:              external-dns
Namespace:         default
Labels:            app.kubernetes.io/instance=external-dns
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=external-dns
helm.sh/chart=external-dns-6.9.0
Annotations:       eks.amazonaws.com/role-arn: arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
meta.helm.sh/release-name: external-dns
meta.helm.sh/release-namespace: default
Selector:          app.kubernetes.io/instance=external-dns,app.kubernetes.io/name=external-dns
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                172.20.233.113
IPs:               172.20.233.113
Port:              http  7979/TCP
TargetPort:        http/TCP
Endpoints:         10.12.13.93:7979
Session Affinity:  None
Events:            <none>

kubectl describe serviceaccount external-dns

Name:                external-dns
Namespace:           default
Labels:              app.kubernetes.io/managed-by=Helm
Annotations:         eks.amazonaws.com/role-arn: arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
meta.helm.sh/release-name: external-dns
meta.helm.sh/release-namespace: default
Image pull secrets:  <none>
Mountable secrets:   external-dns-token-twgpb
Tokens:              external-dns-token-twgpb
Events:              <none>

kubectl describe clusterrole external-dns

Name:         external-dns
Labels:       <none>
Annotations:  eks.amazonaws.com/role-arn: arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
PolicyRule:
Resources                     Non-Resource URLs  Resource Names  Verbs
---------                     -----------------  --------------  -----
endpoints                     []                 []              [get watch list]
nodes                         []                 []              [get watch list]
pods                          []                 []              [get watch list]
services                      []                 []              [get watch list]
ingresses.extensions          []                 []              [get watch list]
gateways.networking.istio.io  []                 []              [get watch list]
ingresses.networking.k8s.io   []                 []              [get watch list]

kubectl describe clusterrolebindings.rbac.authorization.k8s.io external-dns

Name:         external-dns
Labels:       <none>
Annotations:  eks.amazonaws.com/role-arn: arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
Role:
Kind:  ClusterRole
Name:  external-dns
Subjects:
Kind            Name          Namespace
----            ----          ---------
ServiceAccount  external-dns  default

kubectl describe ingress -n kube-system

Name:             aws-lb-ctrlr
Labels:           <none>
Namespace:        kube-system
Address:          
Ingress Class:    <none>
Default backend:  <default>
Rules:
Host        Path  Backends
----        ----  --------
*           
/*   aws-load-balancer-controller:80 (<error: endpoints "aws-load-balancer-controller" not found>)
Annotations:  alb.ingress.kubernetes.io/inbound-cidrs: 0.0.0.0/0
alb.ingress.kubernetes.io/listen-ports: [{'HTTP': 80}]
alb.ingress.kubernetes.io/scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: <my-domain.tld>
kubernetes.io/ingress.class: alb
Events:       <none>

kubectl describe pod

Name:             external-dns-xxxxxxxxxx-xxxxx
Namespace:        default
Priority:         0
Service Account:  external-dns
Node:             ip-10-12-13-107.ec2.internal/10.12.13.107
Start Time:       Tue, 20 Sep 2022 10:48:06 -0400
Labels:           app.kubernetes.io/instance=external-dns
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=external-dns
helm.sh/chart=external-dns-6.9.0
pod-template-hash=xxxxxxxxxx
Annotations:      kubernetes.io/psp: eks.privileged
Status:           Running
IP:               10.12.13.93
IPs:
IP:           10.12.13.93
Controlled By:  ReplicaSet/external-dns-xxxxxxxxxx
Containers:
external-dns:
Container ID:  docker://5b49f49f7b9c0be8cb00835f117eedccaff3d5bb4ebfecb4bc6af771d2b3d336
Image:         docker.io/bitnami/external-dns:0.12.2-debian-11-r14
Image ID:      docker-pullable://bitnami/external-dns@sha256:195dec0f60c9137952ea0604623c7eb001ece4142916bdfb0cc79f5d9cdc4b62
Port:          7979/TCP
Host Port:     0/TCP
Args:
--metrics-address=:7979
--log-level=debug
--log-format=text
--domain-filter=<my-domain.tld>
--policy=sync
--provider=aws
--registry=txt
--interval=1m
--txt-owner-id=<hosted-zone-id>
--source=service
--source=ingress
--aws-api-retries=3
--aws-zone-type=public
--aws-assume-role=arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
--aws-batch-change-size=1000
State:          Running
Started:      Tue, 20 Sep 2022 10:48:13 -0400
Ready:          True
Restart Count:  0
Liveness:       http-get http://:http/healthz delay=10s timeout=5s period=10s #success=1 #failure=2
Readiness:      http-get http://:http/healthz delay=5s timeout=5s period=10s #success=1 #failure=6
Environment:
AWS_DEFAULT_REGION:           us-east-1
AWS_STS_REGIONAL_ENDPOINTS:   regional
AWS_ROLE_ARN:                 arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d82r7 (ro)
Conditions:
Type              Status
Initialized       True 
Ready             True 
ContainersReady   True 
PodScheduled      True 
Volumes:
aws-iam-token:
Type:                    Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds:  86400
kube-api-access-d82r7:
Type:                    Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds:  3607
ConfigMapName:           kube-root-ca.crt
ConfigMapOptional:       <nil>
DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type    Reason     Age    From               Message
----    ------     ----   ----               -------
Normal  Scheduled  3m44s  default-scheduler  Successfully assigned default/external-dns-xxxxxxxxxx-xxxxx to ip-10-12-13-107.ec2.internal
Normal  Pulling    3m43s  kubelet            Pulling image "docker.io/bitnami/external-dns:0.12.2-debian-11-r14"
Normal  Pulled     3m40s  kubelet            Successfully pulled image "docker.io/bitnami/external-dns:0.12.2-debian-11-r14" in 3.588418583s
Normal  Created    3m38s  kubelet            Created container external-dns
Normal  Started    3m37s  kubelet            Started container external-dns

kubectl describe deployments.apps

Name:                   external-dns
Namespace:              default
CreationTimestamp:      Tue, 20 Sep 2022 10:48:06 -0400
Labels:                 app.kubernetes.io/instance=external-dns
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=external-dns
helm.sh/chart=external-dns-6.9.0
Annotations:            deployment.kubernetes.io/revision: 1
meta.helm.sh/release-name: external-dns
meta.helm.sh/release-namespace: default
Selector:               app.kubernetes.io/instance=external-dns,app.kubernetes.io/name=external-dns
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
Labels:           app.kubernetes.io/instance=external-dns
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=external-dns
helm.sh/chart=external-dns-6.9.0
Service Account:  external-dns
Containers:
external-dns:
Image:      docker.io/bitnami/external-dns:0.12.2-debian-11-r14
Port:       7979/TCP
Host Port:  0/TCP
Args:
--metrics-address=:7979
--log-level=debug
--log-format=text
--domain-filter=<my-domain.tld>
--policy=sync
--provider=aws
--registry=txt
--interval=1m
--txt-owner-id=<hosted-zone-id>
--source=service
--source=ingress
--aws-api-retries=3
--aws-zone-type=public
--aws-assume-role=arn:aws:iam::<user-id>:role/AllowExternalDNSUpdates
--aws-batch-change-size=1000
Liveness:   http-get http://:http/healthz delay=10s timeout=5s period=10s #success=1 #failure=2
Readiness:  http-get http://:http/healthz delay=5s timeout=5s period=10s #success=1 #failure=6
Environment:
AWS_DEFAULT_REGION:  us-east-1
Mounts:                <none>
Volumes:                 <none>
Conditions:
Type           Status  Reason
----           ------  ------
Available      True    MinimumReplicasAvailable
Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   external-dns-xxxxxxxxxx (1/1 replicas created)
Events:
Type    Reason             Age    From                   Message
----    ------             ----   ----                   -------
Normal  ScalingReplicaSet  9m30s  deployment-controller  Scaled up replica set external-dns-xxxxxxxxxx to 1

AWS IAM(AllowExternalDNSUpdates(

IAM角色(信任关系(

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<userid>:oidc-provider/oidc.eks.region-code.amazonaws.com/id/<oidc-id>"
},
"Action": "sts:AssumeRoleWithWebIdentity"
}
]
}

IAM策略(权限(

{
"Statement": [
{
"Action": "route53:ChangeResourceRecordSets",
"Effect": "Allow",
"Resource": "arn:aws:route53:::hostedzone/*",
"Sid": ""
},
{
"Action": [
"route53:ListResourceRecordSets",
"route53:ListHostedZones"
],
"Effect": "Allow",
"Resource": "*",
"Sid": ""
}
],
"Version": "2012-10-17"
}

答案

所以基本上有两件事

  1. (来源于@Jordam,在评论中(信任关系不正确,我编辑了帖子以修复它,并重新运行了我的配置。然后我的问题变成了records retrieval failed: failed to list hosted zones: AccessDenied: User: arn:aws:sts::<userid>:assumed-role/AllowExternalDNSUpdates/1663776911448118272 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::<userid>:role/AllowExternalDNSUpdatesntstatus
  2. 因为我有一个额外的错误,我不得不回去修复我的地形头盔配置,并删除";承担角色";背景基本上,如果你有第二个错误;承担角色试图承担角色";那么你只是承担了两次这个角色