AWS Fargate - 实例启动时运行状况检查失败



我与某些帖子的问题类似,但据我所知,没有一个具体问题相关。我将在这篇文章的后面发布我的堆栈。

我有:

ALB----->Listener->target group->Fargate service->task definition
80/http           ->8080/http                   -> 8080/http

问题是我的运行状况检查失败。当 Fargate 任务启动一个实例时,我可以使用运行状况检查 URL 转到该实例,并得到 200 响应。 但是,任何通过负载均衡器的尝试都会导致网关超时。

$ curl -fv http://172.31.47.18:8080/healthz
*   Trying 172.31.47.18...
* TCP_NODELAY set
* Connected to 172.31.47.18 (172.31.47.18) port 8080 (#0)
> GET /healthz HTTP/1.1
> Host: 172.31.47.18:8080
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Transfer-Encoding: chunked
< Date: Sun, 24 Nov 2019 15:33:39 GMT
< Server: Warp/3.2.27
< 
* Connection #0 to host 172.31.47.18 left intact
OK

但是,运行状况检查永远不会在 LB 上传递。

  1. 现在用于每件事的安全组都是开放的。我想消除这个问题。
  2. Fargate 节点是为公共 IP 设置的。

在过去的几天里,这一直让我发疯。我建立了一个 EC2 支持的 ECS,一切都在 EC2 上运行。 我应该指出,整个堆栈在 Fargate 中构建得很好,除了没有从负载均衡器或其他东西获得任何流量。

服务事件中的错误显示

service test-graph (port 8080) is unhealthy in target-group tg--test-graph due to (reason Request timed out).

希望有人有想法。

TaskDef0:
Type: AWS::ECS::TaskDefinition
DependsOn: Cluster0
Properties:
ExecutionRoleArn: arn:aws:iam::xxxxx:role/ECS_Hasura_Execution_Role
TaskRoleArn: arn:aws:iam::xxxxx:role/ecsTaskExecutionRole
Family: !Ref 'ServiceName'
Cpu: !FindInMap
- ContainerSizeMap
- !Ref ContainerSize
- Cpu
Memory: !FindInMap
- ContainerSizeMap
- !Ref ContainerSize
- Memory
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
ContainerDefinitions:
- Name: !Ref 'ServiceName'
Cpu: !FindInMap
- ContainerSizeMap
- !Ref ContainerSize
- Cpu
Memory: !FindInMap
- ContainerSizeMap
- !Ref ContainerSize
- Memory
Image: !FindInMap
- ServiceMap
- !Ref ServiceProvider
- ImageUrl
PortMappings:
- 
ContainerPort: !Ref 'ContainerPort'
HostPort: !Ref ContainerPort
Protocol: tcp
ALB0:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
DependsOn: TaskDef0
Properties: 
Name: !Join
- '-'
- - lb-
- !Ref ServiceName
Scheme: internet-facing
IpAddressType: ipv4
LoadBalancerAttributes: 
- Key: deletion_protection.enabled
Value: false
- Key: idle_timeout.timeout_seconds
Value: 60
- Key: routing.http.drop_invalid_header_fields.enabled
Value: false
- Key: routing.http2.enabled
Value: true
SecurityGroups: 
- sg-xxxxxx # allow HTTP/HTTPS to the load balancer
Subnets: 
- subnet-111111
- subnet-222222
- subnet-333333
Type: application
targetGroup0:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
DependsOn: ALB0
Properties: 
Name: !Join
- '-'
- - tg-
- !Ref ServiceName
Port: !Ref TargetGroupPort
Protocol: !Ref TargetGroupProtocol    
TargetType: ip
VpcId: !FindInMap
- ServiceMap
- !Ref ServiceProvider
- VpcId
# all other paraneters can be changed without interruption
HealthCheckPort: traffic-port
HealthCheckEnabled: !FindInMap
- LBTGMap
- Parameters
- HealthCheckEnabled
HealthCheckIntervalSeconds: !FindInMap
- LBTGMap
- Parameters
- HealthCheckIntervalSeconds
HealthCheckPath: !FindInMap
- ServiceMap
- !Ref ServiceProvider
- HealthCheckPath
HealthCheckProtocol: !FindInMap
- ServiceMap
- !Ref ServiceProvider
- HealthCheckProtocol
HealthCheckTimeoutSeconds: !FindInMap
- LBTGMap
- Parameters
- HealthCheckTimeoutSeconds
HealthyThresholdCount: !FindInMap
- LBTGMap
- Parameters
- HealthyThresholdCount
UnhealthyThresholdCount: !FindInMap
- LBTGMap
- Parameters
- UnhealthyThresholdCount
Matcher: 
HttpCode: !FindInMap
- ServiceMap
- !Ref ServiceProvider
- HealthCheckSuccessCode
TargetGroupAttributes: 
- Key: deregistration_delay.timeout_seconds
Value: !FindInMap
- LBTGMap
- Parameters
- DeregistrationDelay
- Key: slow_start.duration_seconds
Value: !FindInMap
- LBTGMap
- Parameters
- SlowStart
- Key: stickiness.enabled
Value: !FindInMap
- LBTGMap
- Parameters
- Stickiness
Listener0:
# This is the fixed response test listener
Type: AWS::ElasticLoadBalancingV2::Listener
DependsOn: ALB0
Properties:   
DefaultActions: 
- Type: fixed-response      
FixedResponseConfig: 
ContentType: text/html
MessageBody: <h1>Working</h1><p>The load balancer test listener is operational</p>
StatusCode: 200
LoadBalancerArn: !Ref ALB0
Port: 9000
Protocol: HTTP
Listener1:
# This is the port 80 listener
Type: AWS::ElasticLoadBalancingV2::Listener
DependsOn: ALB0
Properties:   
DefaultActions: 
- Type: forward
TargetGroupArn: !Ref targetGroup0
LoadBalancerArn: !Ref ALB0
Port: 80
Protocol: HTTP
Listener2:
# This is the port 8080 listener
Type: AWS::ElasticLoadBalancingV2::Listener
DependsOn: ALB0
Properties:   
DefaultActions: 
- Type: forward
TargetGroupArn: !Ref targetGroup0
LoadBalancerArn: !Ref ALB0
Port: 8080
Protocol: HTTP
Listener3 :
# This is the port 443 listener
Type: AWS::ElasticLoadBalancingV2::Listener
DependsOn: ALB0
Properties:   
Certificates:
- CertificateArn: !FindInMap
- CertificateMap
- !Ref AWS::Region
- CertifcateArn  
DefaultActions: 
- Type: forward
TargetGroupArn: !Ref targetGroup0
LoadBalancerArn: !Ref ALB0
Port: 443
Protocol: HTTPS
Service0:
Type: AWS::ECS::Service
DependsOn: Listener2
Properties:
ServiceName: !Ref 'ServiceName'
Cluster: !Ref Cluster0
LaunchType: FARGATE
DeploymentConfiguration:
MaximumPercent: !FindInMap
- ECSServiceMap
- Parameters
- MaximumPercent
MinimumHealthyPercent: !FindInMap
- ECSServiceMap
- Parameters
- MinimumHealthyPercent
DesiredCount: !Ref 'DesiredTaskCount'
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: ENABLED
SecurityGroups: # this is allow all ports and IPs
- !FindInMap
- SecurityGroupMap
- !Ref AWS::Region
- sg0
Subnets:
- !FindInMap
- SubnetMap
- !Ref AWS::Region
- subnet0
- !FindInMap
- SubnetMap
- !Ref AWS::Region
- subnet1
- !FindInMap
- SubnetMap
- !Ref AWS::Region
- subnet2
TaskDefinition: !Ref 'TaskDef0'
LoadBalancers:
- ContainerName: !Ref 'ServiceName'
ContainerPort: !Ref 'ContainerPort'
TargetGroupArn: !Ref 'targetGroup0'
Tags: 
- Key: Application
Value: !Ref "Application"
- Key: Customer
Value: !Ref "Customer"
- Key: Role
Value: !Ref "Role"
- Key: InternetAccessible
Value: !Ref "InternetAccessible"
- Key: CreationDate
Value: !Ref "CreationDate"
- Key: CreatedBy
Value: !Ref "CreatedBy"
Mappings:
ServiceMap:
GraphQL-Ohio: 
ImageUrl: xxxxx.dkr.ecr.us-east-2.amazonaws.com/hasura/graphql-engine
HealthCheckPath: /healthz
HealthCheckSuccessCode: 200
HealthCheckProtocol: HTTP
VpcId: vpc-xxxxx
LBTGMap:
Parameters:
HealthCheckEnabled: True
HealthCheckIntervalSeconds: 30
HealthCheckTimeoutSeconds: 5
HealthyThresholdCount: 5
UnhealthyThresholdCount: 2
DeregistrationDelay: 300
SlowStart: 0
Stickiness: false
SubnetMap: # There is technical debt here to keep this up to date as subnets change
us-east-2:
subnet0: subnet-111111
subnet1: subnet-222222
subnet2: subnet-333333
SecurityGroupMap: 
us-east-2: 
sg0: sg-xxxxx

好的 - 我想通了。 我已将运行状况检查端口设置为流量端口。 字符串文字"流量端口",而不是实际端口号。 咄。

相关内容

最新更新