我正在尝试使用terraform在Cloudwatch中进行设置和警报。 我的警报基本上需要检查网关中在 5 分钟的 2 个时间段内是否有超过 1xx 的错误。
我尝试了以下代码,但它不起作用:
resource "aws_cloudwatch_metric_alarm" "gateway_error_rate" {
alarm_name = "gateway-errors"
comparison_operator = "GreaterThanOrEqualToThreshold"
alarm_description = "Gateway error rate has exceeded 5%"
treat_missing_data = "notBreaching"
metric_name = "5XXError"
namespace = "AWS/ApiGateway"
period = 60
evaluation_periods = 2
threshold = 5
statistic = "Average"
unit = "Percent"
dimensions = {
ApiName = "my-api"
Stage = "dev"
}
}
即使部署了警报,也不会显示数据。 在进行一些测试时,我注意到此警报显然不接受单位"百分比"。
有没有人在terraform
或cloudformation
中提供有关如何配置此类警报的示例?
根据 Marcin 评论中提供的信息,我在 aws 文档中找到了此信息:
平均值统计量表示 5XXError 错误率,即 5XXError 错误的总数除以该期间的请求总数。分母对应于计数指标(如下(。
我在地形中配置的警报如下所示:
resource "aws_cloudwatch_metric_alarm" "gateway_error_rate" {
alarm_name = "gateway-errors"
comparison_operator = "GreaterThanOrEqualToThreshold"
alarm_description = "Gateway error rate has exceeded 5%"
treat_missing_data = "notBreaching"
metric_name = "5XXError"
namespace = "AWS/ApiGateway"
period = 60
evaluation_periods = 2
threshold = 0.05
statistic = "Average"
unit = "Count"
dimensions = {
ApiName = "my-api"
Stage = "dev"
}
}
我在CloudFormation上使用它并且工作正常,我使用SUM而不是"百分比">
ApiGateway5XXErrorAlarm:
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmDescription: 'Api Gateway server-side errors captured'
Namespace: 'AWS/ApiGateway'
MetricName: 5XXError
Dimensions:
- Name: ApiName
Value: !Ref ApiGateway
- Name: Stage
Value: dev
Statistic: Sum
Period: 60
EvaluationPeriods: 1
Threshold: 1
ComparisonOperator: GreaterThanOrEqualToThreshold
AlarmActions:
- !Ref Alerts
TreatMissingData: notBreaching