普罗米修斯警报管理器由于"超过上下文截止时间"而无法发送通知"



我使用prometheus msteams配置了prometheus运算符图表,用于监视和警告k8s集群。

但并非所有通知都正确指向MSteams频道。如果我有6个正在触发的警报,我可以在alertmanager的UI中看到它们,但其中只有一两个被发送到MS团队频道。

我可以在alertmanager吊舱中看到这个日志:

C:monitoring>kubectl logs alertmanager-monitor-prometheus-operato-alertmanager-0 -c alertmanager
level=info ts=2019-11-04T09:16:47.358Z caller=main.go:217 msg="Starting Alertmanager" version="(version=0.19.0, branch=HEAD, revision=7aa5d19fea3f58e3d27dbdeb0f2883037168914a)"
level=info ts=2019-11-04T09:16:47.358Z caller=main.go:218 build_context="(go=go1.12.8, user=root@587d0268f963, date=20190903-15:01:40)"
level=warn ts=2019-11-04T09:16:47.553Z caller=cluster.go:228 component=cluster msg="failed to join cluster" err="1 error occurred:nt* Failed to resolve alertmanager-monitor-prometheus-operato-alertmanager-0.alertmanager-operated.monitoring.svc:9094: lookup alertmanager-monitor-prometheus-operato-alertmanager-0.alertmanager-operated.monitoring.svc on 169.254.25.10:53: no such hostnn"
level=info ts=2019-11-04T09:16:47.553Z caller=cluster.go:230 component=cluster msg="will retry joining cluster every 10s"
level=warn ts=2019-11-04T09:16:47.553Z caller=main.go:308 msg="unable to join gossip mesh" err="1 error occurred:nt* Failed to resolve alertmanager-monitor-prometheus-operato-alertmanager-0.alertmanager-operated.monitoring.svc:9094: lookup alertmanager-monitor-prometheus-operato-alertmanager-0.alertmanager-operated.monitoring.svc on 169.254.25.10:53: no such hostnn"
level=info ts=2019-11-04T09:16:47.553Z caller=cluster.go:623 component=cluster msg="Waiting for gossip to settle..." interval=2s
level=info ts=2019-11-04T09:16:47.597Z caller=coordinator.go:119 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=info ts=2019-11-04T09:16:47.598Z caller=coordinator.go:131 component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=info ts=2019-11-04T09:16:47.601Z caller=main.go:466 msg=Listening address=:9093
level=info ts=2019-11-04T09:16:49.554Z caller=cluster.go:648 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000149822s
level=info ts=2019-11-04T09:16:57.555Z caller=cluster.go:640 component=cluster msg="gossip settled; proceeding" elapsed=10.001110685s
level=error ts=2019-11-04T09:38:02.472Z caller=notify.go:372 component=dispatcher msg="Error on notify" err="unexpected status code 500: http://prometheus-msteams:2000/alertmanager" context_err="context deadline exceeded"
level=error ts=2019-11-04T09:38:02.472Z caller=dispatch.go:266 component=dispatcher msg="Notify for alerts failed" num_alerts=4 err="unexpected status code 500: http://prometheus-msteams:2000/alertmanager"
level=error ts=2019-11-04T09:43:02.472Z caller=notify.go:372 component=dispatcher msg="Error on notify" err="unexpected status code 500: http://prometheus-msteams:2000/alertmanager" context_err="context deadline exceeded"
level=error ts=2019-11-04T09:43:02.472Z caller=dispatch.go:266 component=dispatcher msg="Notify for alerts failed" num_alerts=5 err="unexpected status code 500: http://prometheus-msteams:2000/alertmanager"
level=error ts=2019-11-04T09:48:02.473Z caller=notify.go:372 component=dispatcher msg="Error on notify" err="unexpected status code 500: http://prometheus-msteams:2000/alertmanager" context_err="context deadline exceeded"
level=error ts=2019-11-04T09:48:02.473Z caller=dispatch.go:266 component=dispatcher msg="Notify for alerts failed" num_alerts=5 err="unexpected status code 500: http://prometheus-msteams:2000/alertmanager"
level=error ts=2019-11-04T09:53:02.473Z caller=notify.go:372 component=dispatcher msg="Error on notify" err="unexpected status code 500: http://prometheus-msteams:2000/alertmanager" context_err="context deadline exceeded"
level=error ts=2019-11-04T09:53:02.473Z caller=dispatch.go:266 component=dispatcher msg="Notify for alerts failed" num_alerts=5 err="unexpected status code 500: http://prometheus-msteams:2000/alertmanager"

如何解决此错误?

编辑:

该设置使用prometheus msteams作为webhook,将警报通知从alertmanager重定向到msteams通道。

prometheus-mstreams容器日志也有一些错误:

C:> kubectl logs prometheus-msteams-564bc7d99c-dpzsm
time="2019-11-06T06:45:14Z" level=info msg="Version: v1.1.4, Commit: d47a7ab, Branch: HEAD, Build Date: 2019-08-04T17:17:06+0000"
time="2019-11-06T06:45:14Z" level=info msg="Parsing the message card template file: /etc/template/card.tmpl"
time="2019-11-06T06:45:15Z" level=warning msg="If the 'config' flag is used, the 'webhook-url' and 'request-uri' flags will be ignored."
time="2019-11-06T06:45:15Z" level=info msg="Parsing the configuration file: /etc/config/connectors.yaml"
time="2019-11-06T06:45:15Z" level=info msg="Creating the server request path "/alertmanager" with webhook "https://outlook.office.com/webhook/00ce0266-7013-4d53-a20f-115ece04042d@9afb1f8a-2192-45ba-b0a1-6b193c758e24/IncomingWebhook/43c3d745ff5e426282f1bc6b5e79bfea/8368b12d-8ac9-4832-b7b5-b337ac267220""
time="2019-11-06T06:45:15Z" level=info msg="prometheus-msteams server started listening at 0.0.0.0:2000"
time="2019-11-06T07:01:07Z" level=info msg="/alertmanager received a request"
time="2019-11-06T07:01:07Z" level=debug msg="Prometheus Alert: {"receiver":"prometheus-msteams","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"KubeDeploymentReplicasMismatch","deployment":"storagesvc","endpoint":"http","instance":"10.233.108.72:8080","job":"kube-state-metrics","namespace":"fission","pod":"monitor-kube-state-metrics-856bc9455b-7z5qx","prometheus":"monitoring/monitor-prometheus-operato-prometheus","service":"monitor-kube-state-metrics","severity":"critical"},"annotations":{"message":"Deployment fission/storagesvc has not matched the expected number of replicas for longer than 15 minutes.","runbook_url":"https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch"},"startsAt":"2019-11-06T07:00:32.453590324Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://monitor-prometheus-operato-prometheus.monitoring:9090/graph?g0.expr=kube_deployment_spec_replicas%7Bjob%3D%22kube-state-metrics%22%7D+%21%3D+kube_deployment_status_replicas_available%7Bjob%3D%22kube-state-metrics%22%7D\u0026g0.tab=1"},{"status":"firing","labels":{"alertname":"KubePodNotReady","namespace":"fission","pod":"storagesvc-5bff46b69b-vfdrd","prometheus":"monitoring/monitor-prometheus-operato-prometheus","severity":"critical"},"annotations":{"message":"Pod fission/storagesvc-5bff46b69b-vfdrd has been in a non-ready state for longer than 15 minutes.","runbook_url":"https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready"},"startsAt":"2019-11-06T07:00:32.453590324Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://monitor-prometheus-operato-prometheus.monitoring:9090/graph?g0.expr=sum+by%28namespace%2C+pod%29+%28kube_pod_status_phase%7Bjob%3D%22kube-state-metrics%22%2Cphase%3D~%22Failed%7CPending%7CUnknown%22%7D%29+%3E+0\u0026g0.tab=1"}],"groupLabels":{"namespace":"fission","severity":"critical"},"commonLabels":{"namespace":"fission","prometheus":"monitoring/monitor-prometheus-operato-prometheus","severity":"critical"},"commonAnnotations":{},"externalURL":"http://monitor-prometheus-operato-alertmanager.monitoring:9093","version":"4","groupKey":"{}:{namespace=\"fission\", severity=\"critical\"}"}"
time="2019-11-06T07:01:07Z" level=debug msg="Alert rendered in template file: rn{rn  "@type": "MessageCard",rn  "@context": "http://schema.org/extensions",rn  "themeColor": "8C1A1A",rn  "summary": "",rn  "title": "Prometheus Alert (firing)",rn  "sections": [ rn    {rn      "activityTitle": "[](http://monitor-prometheus-operato-alertmanager.monitoring:9093)",rn      "facts": [rn        {rn          "name": "message",rn          "value": "Deployment fission/storagesvc has not matched the expected number of replicas for longer than 15 minutes."rn        },rn        {rn          "name": "runbook\\_url",rn          "value": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch"rn        },rn        {rn          "name": "alertname",rn          "value": "KubeDeploymentReplicasMismatch"rn        },rn        {rn          "name": "deployment",rn          "value": "storagesvc"rn        },rn        {rn          "name": "endpoint",rn          "value": "http"rn        },rn        {rn          "name": "instance",rn          "value": "10.233.108.72:8080"rn        },rn        {rn          "name": "job",rn          "value": "kube-state-metrics"rn        },rn        {rn          "name": "namespace",rn          "value": "fission"rn        },rn        {rn          "name": "pod",rn          "value": "monitor-kube-state-metrics-856bc9455b-7z5qx"rn        },rn        {rn          "name": "prometheus",rn          "value": "monitoring/monitor-prometheus-operato-prometheus"rn        },rn        {rn          "name": "service",rn          "value": "monitor-kube-state-metrics"rn        },rn        {rn          "name": "severity",rn          "value": "critical"rn        }rn      ],rn      "markdown": truern    },rn    {rn      "activityTitle": "[](http://monitor-prometheus-operato-alertmanager.monitoring:9093)",rn      "facts": [rn        {rn          "name": "message",rn          "value": "Pod fission/storagesvc-5bff46b69b-vfdrd has been in a non-ready state for longer than 15 minutes."rn        },rn        {rn          "name": "runbook\\_url",rn          "value": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready"rn        },rn        {rn          "name": "alertname",rn          "value": "KubePodNotReady"rn        },rn        {rn          "name": "namespace",rn          "value": "fission"rn        },rn        {rn          "name": "pod",rn          "value": "storagesvc-5bff46b69b-vfdrd"rn        },rn        {rn          "name": "prometheus",rn          "value": "monitoring/monitor-prometheus-operato-prometheus"rn        },rn        {rn          "name": "severity",rn          "value": "critical"rn        }rn      ],rn      "markdown": truern    }rn  ]rn}rn"
time="2019-11-06T07:01:07Z" level=debug msg="Size of message is 1714 Bytes (~1 KB)"
time="2019-11-06T07:01:07Z" level=info msg="Created a card for Microsoft Teams /alertmanager"
time="2019-11-06T07:01:07Z" level=debug msg="Teams message cards: [{"@type":"MessageCard","@context":"http://schema.org/extensions","themeColor":"8C1A1A","summary":"","title":"Prometheus Alert (firing)","sections":[{"activityTitle":"[](http://monitor-prometheus-operato-alertmanager.monitoring:9093)","facts":[{"name":"message","value":"Deployment fission/storagesvc has not matched the expected number of replicas for longer than 15 minutes."},{"name":"runbook\\_url","value":"https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch"},{"name":"alertname","value":"KubeDeploymentReplicasMismatch"},{"name":"deployment","value":"storagesvc"},{"name":"endpoint","value":"http"},{"name":"instance","value":"10.233.108.72:8080"},{"name":"job","value":"kube-state-metrics"},{"name":"namespace","value":"fission"},{"name":"pod","value":"monitor-kube-state-metrics-856bc9455b-7z5qx"},{"name":"prometheus","value":"monitoring/monitor-prometheus-operato-prometheus"},{"name":"service","value":"monitor-kube-state-metrics"},{"name":"severity","value":"critical"}],"markdown":true},{"activityTitle":"[](http://monitor-prometheus-operato-alertmanager.monitoring:9093)","facts":[{"name":"message","value":"Pod fission/storagesvc-5bff46b69b-vfdrd has been in a non-ready state for longer than 15 minutes."},{"name":"runbook\\_url","value":"https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready"},{"name":"alertname","value":"KubePodNotReady"},{"name":"namespace","value":"fission"},{"name":"pod","value":"storagesvc-5bff46b69b-vfdrd"},{"name":"prometheus","value":"monitoring/monitor-prometheus-operato-prometheus"},{"name":"severity","value":"critical"}],"markdown":true}]}]"
time="2019-11-06T07:01:07Z" level=info msg="Microsoft Teams response text: 1"
time="2019-11-06T07:01:07Z" level=info msg="A card was successfully sent to Microsoft Teams Channel. Got http status: 200 OK"
time="2019-11-06T07:01:07Z" level=info msg="Microsoft Teams response text: Summary or Text is required."
time="2019-11-06T07:01:07Z" level=error msg="Failed sending to the Teams Channel. Teams http response: 400 Bad Request"
time="2019-11-06T07:01:08Z" level=info msg="/alertmanager received a request"
time="2019-11-06T07:01:08Z" level=debug msg="Prometheus Alert: {"receiver":"prometheus-msteams","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"KubeDeploymentReplicasMismatch","deployment":"storagesvc","endpoint":"http","instance":"10.233.108.72:8080","job":"kube-state-metrics","namespace":"fission","pod":"monitor-kube-state-metrics-856bc9455b-7z5qx","prometheus":"monitoring/monitor-prometheus-operato-prometheus","service":"monitor-kube-state-metrics","severity":"critical"},"annotations":{"message":"Deployment fission/storagesvc has not matched the expected number of replicas for longer than 15 minutes.","runbook_url":"https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch"},"startsAt":"2019-11-06T07:00:32.453590324Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://monitor-prometheus-operato-prometheus.monitoring:9090/graph?g0.expr=kube_deployment_spec_replicas%7Bjob%3D%22kube-state-metrics%22%7D+%21%3D+kube_deployment_status_replicas_available%7Bjob%3D%22kube-state-metrics%22%7D\u0026g0.tab=1"},{"status":"firing","labels":{"alertname":"KubePodNotReady","namespace":"fission","pod":"storagesvc-5bff46b69b-vfdrd","prometheus":"monitoring/monitor-prometheus-operato-prometheus","severity":"critical"},"annotations":{"message":"Pod fission/storagesvc-5bff46b69b-vfdrd has been in a non-ready state for longer than 15 minutes.","runbook_url":"https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready"},"startsAt":"2019-11-06T07:00:32.453590324Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://monitor-prometheus-operato-prometheus.monitoring:9090/graph?g0.expr=sum+by%28namespace%2C+pod%29+%28kube_pod_status_phase%7Bjob%3D%22kube-state-metrics%22%2Cphase%3D~%22Failed%7CPending%7CUnknown%22%7D%29+%3E+0\u0026g0.tab=1"}],"groupLabels":{"namespace":"fission","severity":"critical"},"commonLabels":{"namespace":"fission","prometheus":"monitoring/monitor-prometheus-operato-prometheus","severity":"critical"},"commonAnnotations":{},"externalURL":"http://monitor-prometheus-operato-alertmanager.monitoring:9093","version":"4","groupKey":"{}:{namespace=\"fission\", severity=\"critical\"}"}"
time="2019-11-06T07:01:08Z" level=debug msg="Alert rendered in template file: rn{rn  "@type": "MessageCard",rn  "@context": "http://schema.org/extensions",rn  "themeColor": "8C1A1A",rn  "summary": "",rn  "title": "Prometheus Alert (firing)",rn  "sections": [ rn    {rn      "activityTitle": "[](http://monitor-prometheus-operato-alertmanager.monitoring:9093)",rn      "facts": [rn        {rn          "name": "message",rn          "value": "Deployment fission/storagesvc has not matched the expected number of replicas for longer than 15 minutes."rn        },rn        {rn          "name": "runbook\\_url",rn          "value": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch"rn        },rn        {rn          "name": "alertname",rn          "value": "KubeDeploymentReplicasMismatch"rn        },rn        {rn          "name": "deployment",rn          "value": "storagesvc"rn        },rn        {rn          "name": "endpoint",rn          "value": "http"rn        },rn        {rn          "name": "instance",rn          "value": "10.233.108.72:8080"rn        },rn        {rn          "name": "job",rn          "value": "kube-state-metrics"rn        },rn        {rn          "name": "namespace",rn          "value": "fission"rn        },rn        {rn          "name": "pod",rn          "value": "monitor-kube-state-metrics-856bc9455b-7z5qx"rn        },rn        {rn          "name": "prometheus",rn          "value": "monitoring/monitor-prometheus-operato-prometheus"rn        },rn        {rn          "name": "service",rn          "value": "monitor-kube-state-metrics"rn        },rn        {rn          "name": "severity",rn          "value": "critical"rn        }rn      ],rn      "markdown": truern    },rn    {rn      "activityTitle": "[](http://monitor-prometheus-operato-alertmanager.monitoring:9093)",rn      "facts": [rn        {rn          "name": "message",rn          "value": "Pod fission/storagesvc-5bff46b69b-vfdrd has been in a non-ready state for longer than 15 minutes."rn        },rn        {rn          "name": "runbook\\_url",rn          "value": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready"rn        },rn        {rn          "name": "alertname",rn          "value": "KubePodNotReady"rn        },rn        {rn          "name": "namespace",rn          "value": "fission"rn        },rn        {rn          "name": "pod",rn          "value": "storagesvc-5bff46b69b-vfdrd"rn        },rn        {rn          "name": "prometheus",rn          "value": "monitoring/monitor-prometheus-operato-prometheus"rn        },rn        {rn          "name": "severity",rn          "value": "critical"rn        }rn      ],rn      "markdown": truern    }rn  ]rn}rn"
time="2019-11-06T07:01:08Z" level=debug msg="Size of message is 1714 Bytes (~1 KB)"
time="2019-11-06T07:01:08Z" level=info msg="Created a card for Microsoft Teams /alertmanager"
time="2019-11-06T07:01:08Z" level=debug msg="Teams message cards: [{"@type":"MessageCard","@context":"http://schema.org/extensions","themeColor":"8C1A1A","summary":"","title":"Prometheus Alert (firing)","sections":[{"activityTitle":"[](http://monitor-prometheus-operato-alertmanager.monitoring:9093)","facts":[{"name":"message","value":"Deployment fission/storagesvc has not matched the expected number of replicas for longer than 15 minutes."},{"name":"runbook\\_url","value":"https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch"},{"name":"alertname","value":"KubeDeploymentReplicasMismatch"},{"name":"deployment","value":"storagesvc"},{"name":"endpoint","value":"http"},{"name":"instance","value":"10.233.108.72:8080"},{"name":"job","value":"kube-state-metrics"},{"name":"namespace","value":"fission"},{"name":"pod","value":"monitor-kube-state-metrics-856bc9455b-7z5qx"},{"name":"prometheus","value":"monitoring/monitor-prometheus-operato-prometheus"},{"name":"service","value":"monitor-kube-state-metrics"},{"name":"severity","value":"critical"}],"markdown":true},{"activityTitle":"[](http://monitor-prometheus-operato-alertmanager.monitoring:9093)","facts":[{"name":"message","value":"Pod fission/storagesvc-5bff46b69b-vfdrd has been in a non-ready state for longer than 15 minutes."},{"name":"runbook\\_url","value":"https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready"},{"name":"alertname","value":"KubePodNotReady"},{"name":"namespace","value":"fission"},{"name":"pod","value":"storagesvc-5bff46b69b-vfdrd"},{"name":"prometheus","value":"monitoring/monitor-prometheus-operato-prometheus"},{"name":"severity","value":"critical"}],"markdown":true}]}]"
time="2019-11-06T07:01:08Z" level=info msg="Microsoft Teams response text: Summary or Text is required."
time="2019-11-06T07:01:08Z" level=error msg="Failed sending to the Teams Channel. Teams http response: 400 Bad Request"

可能是由于prometheus mstreams中的400 bad request错误,alertmanager返回了unexpected status code 500

文件出现问题https://github.com/bzon/prometheus-msteams/blob/master/chart/prometheus-msteams/card.tmpl导致了这些错误。

问题是摘要字段为空。如本教程所述,对文件进行了轻微更改,解决了错误。

您可以通过覆盖默认的卡模板来使用新修改的卡模板。

最新更新