ECS上下文中的Docker compose作业无法从DockerHub中获取AWS机密来检索镜像



我正在尝试使用docker compose创建并启动一个只有2个容器的小型ECS Fargate集群。其中一个容器映像位于我位于DockerHub的私人存储库中。但是,命令

docker compose --file path-to-docker-compose-yml-file up

在ECS上下文中启动时一直失败,并显示错误消息:

QuarkustodoService TaskFailedToStart: ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to get registry auth from asm: service call has been retried 1 time(s): failed to fetch secret arn:aws:secretsmanager:us-east-2:1071311304...

然而,我的AWS用户id的ecsTaskExecutionRole确实配备了必要的策略来获取机密并解密存储在AWS机密管理器和kms中的DockerHub凭据。我使用DockerHub用户id和访问令牌作为凭据,并验证它们是否正常工作。

有人能帮助或有想法如何调试这个问题吗?

完整的命令行输出为:

docker compose --file path-to-docker-compose.yml up
[+] Running 17/17
- quarkus-todo                      DeleteComplete                                                                                    205.2s
- Cluster                           DeleteComplete                                                                                    154.2s
- Quarkustodo8080TargetGroup        DeleteComplete                                                                                    155.5s
- CloudMap                          DeleteComplete                                                                                    200.2s
- LogGroup                          DeleteComplete                                                                                    157.5s
- DbTaskExecutionRole               DeleteComplete                                                                                    147.6s
- QuarkustodoTaskExecutionRole      DeleteComplete                                                                                    157.5s
- DefaultNetwork                    DeleteComplete                                                                                    154.2s
- Quarkustodo8080Listener           DeleteComplete                                                                                    152.3s
- DefaultNetworkIngress             DeleteComplete                                                                                     82.4s
- Default8080Ingress                DeleteComplete                                                                                     81.2s
- DbTaskDefinition                  DeleteComplete                                                                                    127.4s
- QuarkustodoTaskDefinition         DeleteComplete                                                                                    137.2s
- QuarkustodoServiceDiscoveryEntry  DeleteComplete                                                                                    106.1s
- DbServiceDiscoveryEntry           DeleteComplete                                                                                     96.4s
- DbService                         DeleteComplete                                                                                     90.2s
- QuarkustodoService                DeleteComplete                                                                                    100.3s
QuarkustodoService TaskFailedToStart: ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to get registry auth from asm: service call has been retried 1 time(s): failed to fetch secret arn:aws:secretsmanager:us-east-2:1071311304...

我正在使用:

Docker version 20.10.6 on Win10
aws --version
aws-cli/2.2.9 Python/3.8.8 Windows/10 exe/AMD64 prompt/off
ecs-cli --version
ecs-cli version 1.21.0 (bb0b8f0)

这是我的docker-compose.yml文件:

version: "3.8"
x-aws-vpc: "vpc-07cdb7bacc9b8010a"
x-aws-loadbalancer: "exter-Publi-FY7S28M1QL7L"
services:
quarkus-todo: 
image: bergemannf/mytoolchain:quarkus-todo-ce
x-aws-pull_credentials: arn:aws:secretsmanager:us-east-2:1111111111111:secret:DockerHubAPIToken-d9RKLn
ports:
- target: 8080
x-aws-protocol: http
db:
image: postgres
environment:
POSTGRES_USER: "<some user-id>"
POSTGRES_PASSWORD: "<some pw>"
POSTGRES_DB: "<some db>"

这是我的ecs-param.yml文件:

version: 1
task_definition:
ecs_network_mode: awsvpc
task_role_arn: arn:aws:iam::107131130437:role/ECSTaskRole
task_execution_role:  arn:aws:iam::1111111111:role/ecsTaskExecutionRole
task_size:
cpu_limit: 256
mem_limit: 512
pid_mode: task
ipc_mode: task
services:
quarkus-todo:
essential: true
depends_on:
- container_name: db
condition: START
init_process_enabled: false
healthcheck:
test: ["CMD", "curl -f http://localhost"]
interval: 10
timeout: 5
retries: 3
start_period: 180
secrets:  
- value_from: arn:aws:secretsmanager:us-east-2:107131130437:secret:DockerHubAPIToken-d9RKLn
#          name: dev/DockerHubAccessToken
db:
essential: false
efs_volumes:
- name: postgres-db-efs
filesystem_id: fs-5473872f
root_directory: /
access_point: fsap-11111111
run_params:
network_configuration:
awsvpc_configuration:
subnets: 
- subnet-0af2d8c8faa7f6b9f
- subnet-039c3a3061848c2a9
security_groups: 
- sg-0d52c217fa0f25cfb
assign_public_ip: ENABLED

首先,您提到了ecs-cliecs-param.yml文件,但compose/ecs集成没有利用它们(对于记录,ecs CLI已发展为AWS Copilot,并放弃了利用您正在使用的新docker compose实现的compose兼容性(。

话虽如此,您的撰写文件实际上看起来不错。您只需要调用x-aws-pull_credentials参数,docker就会生成一个包含所有必需布线的CFN模板。例如,这个compose文件就是我正在使用的:

version: "3.8"
services:
myweb: 
image: mreferre/nginx-custom-site
x-aws-pull_credentials: arn:aws:secretsmanager:us-west-2:111111111111:secret:dockerhubAccessToken-xyzwh
ports: 
- "80:80"

而且它运行得很好

$ docker compose up      
[+] Running 14/14
⠿ downloads                   CreateComplete                                                                                                                                                                             190.3s
⠿ Cluster                     CreateComplete                                                                                                                                                                               6.0s
⠿ CloudMap                    CreateComplete                                                                                                                                                                              47.0s
⠿ DefaultNetwork              CreateComplete                                                                                                                                                                               5.0s
⠿ MywebTaskExecutionRole      CreateComplete                                                                                                                                                                              20.1s
⠿ MywebTCP80TargetGroup       CreateComplete                                                                                                                                                                               1.0s
⠿ LogGroup                    CreateComplete                                                                                                                                                                               2.0s
⠿ LoadBalancer                CreateComplete                                                                                                                                                                              92.0s
⠿ DefaultNetworkIngress       CreateComplete                                                                                                                                                                               1.0s
⠿ Default80Ingress            CreateComplete                                                                                                                                                                               1.0s
⠿ MywebTaskDefinition         CreateComplete                                                                                                                                                                               3.0s
⠿ MywebServiceDiscoveryEntry  CreateComplete                                                                                                                                                                               2.0s
⠿ MywebTCP80Listener          CreateComplete                                                                                                                                                                               2.0s
⠿ MywebService                CreateComplete                                                                                                                                                                              76.6s
$

[注意,我使用的图像不是私人的,但提取图像和登录DH的整个过程不会出错]

我唯一能想到的是,如果你的网络设置不允许你访问Secrets Manager端点?

我可以通过以下命令创建一个新的ecs机密来解决这个问题:

docker ecs secret create <yourTokenName> --username <yourUserName>  --password <yourPassword>

然后,该命令输出一个AWS机密arn作为对新创建的机密的引用。我把这个arn放在码头的指定位置然后docker compose up命令工作,ECS集群成功提升。真的很酷。

最新更新