Prometheus操作员Alertmanager-自定义规则-在config.plain类型中找不到字段组



我已经通过helm安装了prometheus操作员,现在想设置自定义警报规则,设置了电子邮件通知,目前我正在收到每个通知,我想"静音",这样我就可以收到自定义警报的电子邮件。

alertmanager.yaml:

global:
resolve_timeout: 5m
route:
receiver: 'email-alert'
group_by: ['job']

routes:
- receiver: 'email-alert'
match:
alertname: etcdInsufficientMembers
group_wait: 30s
group_interval: 5m
repeat_interval: 12h  
receivers:
- name: email-alert
email_configs:
- to: receiver@example.com
from: sender@example.com
# Your smtp server address
smarthost: smtp.office365.com:587
auth_username: sender@example.com
auth_identity: sender@example.com
auth_password: pass

以上文件应用成功,

我在上面文件的末尾添加了以下几行,如下所述:

# Example group with one alert
groups:
- name: example-alert
rules:
# Alert about restarts
- alert: RestartAlerts
expr: count(kube_pod_container_status_restarts_total) > 0
for: 1s
annotations:
summary: "More than 5 restarts in pod {{ $labels.pod-name }}"
description: "{{ $labels.container-name }} restarted (current value: {{ $value }}s) times in pod {{ $labels.pod-namespace }}/{{ $labels.pod-name }}

然后在吊舱日志中,我得到了这个:

="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:n  line 28: field groups not found in type config.plain"

已解决,首先需要列出所有可用规则:

kubectl -n monitoring get prometheusrules
NAME                                                              AGE
prometheus-prometheus-oper-alertmanager.rules                     29h
prometheus-prometheus-oper-etcd                                   29h
prometheus-prometheus-oper-general.rules                          29h
prometheus-prometheus-oper-k8s.rules                              29h
prometheus-prometheus-oper-kube-apiserver-error                   29h
prometheus-prometheus-oper-kube-apiserver.rules                   29h
prometheus-prometheus-oper-kube-prometheus-node-recording.rules   29h
prometheus-prometheus-oper-kube-scheduler.rules                   29h
prometheus-prometheus-oper-kubernetes-absent                      29h
prometheus-prometheus-oper-kubernetes-apps                        29h
prometheus-prometheus-oper-kubernetes-resources                   29h
prometheus-prometheus-oper-kubernetes-storage                     29h
prometheus-prometheus-oper-kubernetes-system                      29h
prometheus-prometheus-oper-kubernetes-system-apiserver            29h
prometheus-prometheus-oper-kubernetes-system-controller-manager   29h
prometheus-prometheus-oper-kubernetes-system-kubelet              29h
prometheus-prometheus-oper-kubernetes-system-scheduler            29h
prometheus-prometheus-oper-node-exporter                          29h
prometheus-prometheus-oper-node-exporter.rules                    29h
prometheus-prometheus-oper-node-network                           29h
prometheus-prometheus-oper-node-time                              29h
prometheus-prometheus-oper-node.rules                             29h
prometheus-prometheus-oper-prometheus                             29h
prometheus-prometheus-oper-prometheus-operator                    29h

然后选择一个进行编辑,或删除除默认的prometheus-prometheus-oper-general.rules之外的所有内容

我选择编辑节点导出器规则

kubectl edit prometheusrule prometheus-prometheus-oper-node-exporter -n monitoring

在文件末尾添加了这些行

- alert: RestartAlerts
annotations:
description: '{{ $labels.container }} restarted (current value: {{ $value}}s)
times in pod {{ $labels.namespace }}/{{ $labels.pod }}'
summary: More than 5 restarts in pod {{ $labels.container }}
expr: kube_pod_container_status_restarts_total{container="coredns"} > 5
for: 1min
labels:
severity: warning

不久之后,我收到了关于此警报的电子邮件。

相关内容

最新更新