Prometheus up metric显示即使端点可达也为0



我有一个简单的pod与nginx容器返回路径/上的文本healthy。我让普罗米修斯在路径/上刮掉80端口。当我在普罗米修斯仪表板上运行up == 0时,它显示了这个舱,这意味着这个舱不健康。但是我尝试ssh进入容器,它运行良好,我在nginx日志中看到普罗米修斯正在ping/并获得200响应。知道为什么吗?

deployment.yml

apiVersion: apps/v1
kind: Deployment
metadata:
...
spec:
...
template:
metadata:
labels:
...
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/"
prometheus.io/port: "80"
spec:
containers:
- name: nginx
image: nginx
volumeMounts:
- name: nginx-conf
mountPath: /etc/nginx
readOnly: true
ports:
- containerPort: 80
volumes:
- name: nginx-conf
configMap:
name: nginx-conf
items:
- key: nginx.conf
path: nginx.conf

nginx.conf

apiVersion: v1
kind: ConfigMap
metadata:
name: nginx-conf
data:
nginx.conf: |
http {
server {
listen 80;
location / {
return 200 'healthyn';
}
}
}

nginx access log

192.168.88.81 - - [xxx +0000] "GET / HTTP/1.1" 200 8 "-" "Prometheus/2.26.0"
192.168.88.81 - - [xxx +0000] "GET / HTTP/1.1" 200 8 "-" "Prometheus/2.26.0"
192.168.88.81 - - [xxx +0000] "GET / HTTP/1.1" 200 8 "-" "Prometheus/2.26.0"

当您将这些注释配置为pods时,普罗米修斯期望给定的路径返回普罗米修斯可读的度量。但是'healthyn'不是一个有效的普罗米修斯度量类型。

annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/"
prometheus.io/port: "80"

推荐的解决办法:

  • 使用nginx-prometheus-export作为sidecar.
  • 在注释中添加侧车信息,以便普罗米修斯可以从中抓取指标。
apiVersion: apps/v1
kind: Deployment
metadata:
...
spec:
...
template:
metadata:
labels:
...
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/metrics"
prometheus.io/port: "9113"
spec:
containers:
- name: nginx
image: nginx
volumeMounts:
- name: nginx-conf
mountPath: /etc/nginx
readOnly: true
ports:
- containerPort: 80
- name: nginx-exporter
args:
- "-nginx.scrape-uri=http://localhost:80/stub_status" # nginx address
image: nginx/nginx-prometheus-exporter:0.9.0
ports:
- containerPort: 9113
volumes:
- name: nginx-conf
configMap:
name: nginx-conf
items:
- key: nginx.conf
path: nginx.conf

现在,尝试从Prometheus查询nginx_up。nginx-prometheus-export也自带一个grafana仪表板,你也可以试试。

当普罗米修斯擦除一个端点时,它需要度量。典型的指标是这样的:

# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.3234e-05
go_gc_duration_seconds{quantile="0.25"} 1.7335e-05

"healthy"不符合标准,因此它导致Prometheus无法抓取此目标。有一个黑箱导出器,它被设计用来从用户的角度检查端点(这就是黑箱监视)。导出器可以执行HTTP请求并对结果进行度量。例如,它可以检查响应码是否为200,或者响应体是否包含某些文本。以下是该导出器返回的示例指标(注意probe_success,这与up相同):

# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.026007318
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.550007522
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_http_content_length Length of http content response
# TYPE probe_http_content_length gauge
probe_http_content_length -1
# HELP probe_http_duration_seconds Duration of http request by phase, summed over all redirects
# TYPE probe_http_duration_seconds gauge
probe_http_duration_seconds{phase="connect"} 0.098082009
probe_http_duration_seconds{phase="processing"} 0.154402544
probe_http_duration_seconds{phase="resolve"} 0.038066771
probe_http_duration_seconds{phase="tls"} 0.209702302
probe_http_duration_seconds{phase="transfer"} 0.047839785
# HELP probe_http_redirects The number of redirects
# TYPE probe_http_redirects gauge
probe_http_redirects 1
# HELP probe_http_ssl Indicates if SSL was used for the final redirect
# TYPE probe_http_ssl gauge
probe_http_ssl 1
# HELP probe_http_status_code Response HTTP status code
# TYPE probe_http_status_code gauge
probe_http_status_code 200
# HELP probe_http_uncompressed_body_length Length of uncompressed response body
# TYPE probe_http_uncompressed_body_length gauge
probe_http_uncompressed_body_length 87617
# HELP probe_http_version Returns the version of HTTP of the probe response
# TYPE probe_http_version gauge
probe_http_version 2
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 8.57979034e+08
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_ssl_earliest_cert_expiry Returns earliest SSL cert expiry in unixtime
# TYPE probe_ssl_earliest_cert_expiry gauge
probe_ssl_earliest_cert_expiry 1.639030838e+09
# HELP probe_ssl_last_chain_expiry_timestamp_seconds Returns last SSL chain expiry in timestamp seconds
# TYPE probe_ssl_last_chain_expiry_timestamp_seconds gauge
probe_ssl_last_chain_expiry_timestamp_seconds 1.639030838e+09
# HELP probe_ssl_last_chain_info Contains SSL leaf certificate information
# TYPE probe_ssl_last_chain_info gauge
probe_ssl_last_chain_info{fingerprint_sha256="ef4eaeb464efb33f5332b365a350b2b06588ea71837af27f83d45b726d19af2a"} 1
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
# HELP probe_tls_version_info Contains the TLS version used
# TYPE probe_tls_version_info gauge
probe_tls_version_info{version="TLS 1.2"} 1

最新更新