云观察警报错误百分比 API 网关



我正在尝试使用terraform在Cloudwatch中进行设置和警报。 我的警报基本上需要检查网关中在 5 分钟的 2 个时间段内是否有超过 1xx 的错误。

我尝试了以下代码,但它不起作用:

resource "aws_cloudwatch_metric_alarm" "gateway_error_rate" {
alarm_name          = "gateway-errors"
comparison_operator = "GreaterThanOrEqualToThreshold"
alarm_description   = "Gateway error rate has exceeded 5%"
treat_missing_data  = "notBreaching"
metric_name         = "5XXError"
namespace           = "AWS/ApiGateway"
period              = 60
evaluation_periods  = 2
threshold           = 5
statistic           = "Average"
unit                = "Percent"
dimensions = {
ApiName = "my-api"
Stage = "dev"
}
}

即使部署了警报,也不会显示数据。 在进行一些测试时,我注意到此警报显然不接受单位"百分比"。

有没有人在terraformcloudformation中提供有关如何配置此类警报的示例?

根据 Marcin 评论中提供的信息,我在 aws 文档中找到了此信息:

平均值统计量表示 5XXError 错误率,即 5XXError 错误的总数除以该期间的请求总数。分母对应于计数指标(如下(。

我在地形中配置的警报如下所示:

resource "aws_cloudwatch_metric_alarm" "gateway_error_rate" {
alarm_name          = "gateway-errors"
comparison_operator = "GreaterThanOrEqualToThreshold"
alarm_description   = "Gateway error rate has exceeded 5%"
treat_missing_data  = "notBreaching"
metric_name         = "5XXError"
namespace           = "AWS/ApiGateway"
period              = 60
evaluation_periods  = 2
threshold           = 0.05
statistic           = "Average"
unit                = "Count"
dimensions = {
ApiName = "my-api"
Stage = "dev"
}
}

我在CloudFormation上使用它并且工作正常,我使用SUM而不是"百分比">

ApiGateway5XXErrorAlarm:
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmDescription: 'Api Gateway server-side errors captured'
Namespace: 'AWS/ApiGateway'
MetricName: 5XXError
Dimensions:
- Name: ApiName
Value: !Ref ApiGateway
- Name: Stage
Value: dev
Statistic: Sum
Period: 60
EvaluationPeriods: 1
Threshold: 1
ComparisonOperator: GreaterThanOrEqualToThreshold
AlarmActions:
- !Ref Alerts
TreatMissingData: notBreaching

最新更新