我有以下查询,结果是每个端点和方法的错误率。是否有任何方法可以在alertmanager中创建警报(最好是在警报体中的端点和方法名称),如果任何值高于某个阈值,即10%?
sum by (endpoint, method) (
http_requests_received_total{code=~"5.."} /
http_requests_received_total
)
您可以从以下PrometheusRule数组中获得灵感:
- alert: "APIErrorRateIsHigh"
annotations:
summary: "Error rate is high"
description: 'Error rate is higher than 10% on {{ $labels.method }} - {{ $labels.endpoint }}n VALUE = {{ $value }}n LABELS = {{ $labels }}'
expr: |-
(
sum by (endpoint, method)
(
rate(http_requests_received_total{code=~"5.."}[5m])
/
rate(http_requests_received_total[5m])
)
) > 0.1
for: 0m
labels:
severity: high
我建议您在prometheus中测试查询,并将阈值设置为更容易触发测试的值,例如0.0001。然后你可以写你的警报规则。