在 2 个或更多实例之间添加"cooldown"或"pausetime"周期,同时减少所需容量



(因为我是法律新手,所以提前道歉)

我正在使用cloudformation堆栈来管理我的ECS集群。

假设我们有一个期望容量为5个ec2实例(minSize: 1, maxSize:7)的ASG,并且我手动将期望容量的值从5更改为2,它减少了通过集群更改集的实例数量,所有实例都同时关闭。它没有时间调度左边实例上的前一个容器。因此,如果从5个实例变为2个实例,则直接关闭所有3个实例。如果不幸的是同一类型的所有容器都在这3台机器上,那么容器就不存在了,服务就会关闭。

可以设置"冷却时间"吗?每次终止之间?使用缩放策略显然没有帮助,因为我们不想设置一个度量,因为可用的度量在我的情况下没有帮助。

请在下面找到一些日志:

2021-01-15 15:45:52 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  Rolling update initiated. Terminating 3 obsolete instance(s) in batches of 1, while keeping at least 1 instance(s) in service. Waiting on resource signals with a timeout of PT5M when new instances are added to the autoscaling group.
2021-01-15 15:45:52 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  Temporarily setting autoscaling group MinSize and DesiredCapacity to 3.
2021-01-15 15:45:54 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  Terminating instance(s) [i-X]; replacing with 1 new instance(s).
2021-01-15 15:47:40 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  New instance(s) added to autoscaling group - Waiting on 1 resource signal(s) with a timeout of PT5M.
2021-01-15 15:47:40 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  Successfully terminated instance(s) [i-X] (Progress 33%).
2021-01-15 15:52:42 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  Terminating instance(s) [i-X]; replacing with 1 new instance(s).
2021-01-15 15:53:59 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  New instance(s) added to autoscaling group - Waiting on 1 resource signal(s) with a timeout of PT5M.
2021-01-15 15:53:59 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  Successfully terminated instance(s) [i-X] (Progress 67%).
2021-01-15 15:59:02 UTC+0100    dev-cluster UPDATE_ROLLBACK_IN_PROGRESS The following resource(s) failed to update: [autoScalingGroup].
2021-01-15 15:59:17 UTC+0100    securityGroup   UPDATE_IN_PROGRESS  -
2021-01-15 15:59:32 UTC+0100    securityGroup   UPDATE_COMPLETE -
2021-01-15 15:59:33 UTC+0100    launchConfiguration UPDATE_COMPLETE -
2021-01-15 15:59:34 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  -
2021-01-15 15:59:37 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  Rolling update initiated. Terminating 2 obsolete instance(s) in batches of 1, while keeping at least 1 instance(s) in service. Waiting on resource signals with a timeout of PT5M when new instances are added to the autoscaling group.
2021-01-15 15:59:37 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  Temporarily setting autoscaling group MinSize and DesiredCapacity to 3.
2021-01-15 15:59:38 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  Terminating instance(s) [i-X]; replacing with 1 new instance(s).
2021-01-15 16:01:25 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  New instance(s) added to autoscaling group - Waiting on 1 resource signal(s) with a timeout of PT5M.
2021-01-15 16:01:25 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  Successfully terminated instance(s) [i-X] (Progress 50%).
2021-01-15 16:01:46 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  Received SUCCESS signal with UniqueId i-X
2021-01-15 16:01:47 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  Terminating instance(s) [i-X]; replacing with 1 new instance(s).
2021-01-15 16:03:34 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  New instance(s) added to autoscaling group - Waiting on 1 resource signal(s) with a timeout of PT5M.
2021-01-15 16:03:34 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  Received SUCCESS signal with UniqueId i-X
2021-01-15 16:03:34 UTC+0100    autoScalingGroup    UPDATE_IN_PROGRESS  Successfully terminated instance(s) [i-X] (Progress 100%).
2021-01-15 16:03:37 UTC+0100    autoScalingGroup    UPDATE_COMPLETE -
2021-01-15 16:03:37 UTC+0100    dev-cluster UPDATE_ROLLBACK_COMPLETE_CLEANUP_IN_PROGRESS    -
2021-01-15 16:03:38 UTC+0100    launchConfiguration DELETE_IN_PROGRESS  -
2021-01-15 16:03:39 UTC+0100    dev-cluster UPDATE_ROLLBACK_COMPLETE    -
2021-01-15 16:03:39 UTC+0100    launchConfiguration DELETE_COMPLETE -

提前感谢您的帮助!

对于您的直接问题,没有任何功能可以强制ASG在发生所需的下降时只删除x个实例

如果你还没有,你应该在ASG上有一个生命周期钩子来触发一个脚本,告诉ECS从实例中抽出容器(我假设从上下文中你使用ECS)。但是,在这种情况下,您仍然需要每次手动降低所需的1。https://aws.amazon.com/blogs/compute/how-to-automate-container-instance-draining-in-amazon-ecs/

如果您在CloudFormation中降低了所需的值,那么您可以将UpdatePolicy附加到组上,告诉CFN执行rolllingupdate以分批替换实例,每次1个https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html

如果您正在使用ECS,设置2个目标跟踪缩放策略通常是一个好主意。CPUReservation为1,MemoryReservation为1。如果您希望强制ASG每次不扩展超过1个实例,也可以基于这些指标手动创建步骤扩展策略,但是在CFN中创建4个cloudwatch警报将是一个痛苦的过程

另一个选择是在ECS中使用CapacityProvider,它将在任何有任务运行的实例上启用实例保护

最新更新