Strimzi Kafka+Kafka Exporter:消费者群体没有出现在普罗米修斯度量Kafka_consumer



我在GKE上有一个Strimzi Kafka集群,我也部署了kafkaExporter。Kafka Topic有acl,并定义了一个消费者组(spark-Kafka-source-*(,可以从该主题读取数据。

我正在运行一个Spark StructuredStreaming程序,它从Kafka主题中读取数据。问题是,当我检查度量时,KafkaExporter似乎没有显示Consumer组->kafka_consumergroup_lag

消费者组以度量显示:kafka_consumergroup_members->

kafka_consumergroup_members{consumergroup="spark-kafka-source-657d6441-5716-43d9-b456-73657a5534a3-594190416-driver-0",container="versa-kafka-gke-kafka-exporter-84c7ffbb79-jzqjn",endpoint="tcp prometheus",instance="10.40.0.65:9404",job="monitoring/kafka resources metrics";,namespace="kafka",节点_ip=";10.142.0.24〃;,节点名称=";gke-versa-kafka-gke-default-pool-a92b23b7-n0x2";,pod=";versa-kafka-gke-kafka-exporter-84c7ffbb79-jzqjn";,strimzi_io_cluster=";反之亦然kafka gke";,strimzi_io_kind=;Kafka";,strimzi_io_name=";反之亦然卡夫卡gke卡夫卡出口商"}

这些是山芋:

kafka-deployment.yaml (contains the kafkExporter tag)
-----------------------------------------------------
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: versa-kafka-gke #1
spec:
kafka:
version: 3.0.0
replicas: 3
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
authentication:
type: tls  
- name: external
port: 9094
type: loadbalancer
tls: true 
authentication:
type: tls
authorization:
type: simple    
readinessProbe:
initialDelaySeconds: 15
timeoutSeconds: 5
livenessProbe:
initialDelaySeconds: 15
timeoutSeconds: 5     
config:
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
transaction.state.log.min.isr: 2
log.message.format.version: "3.0"
inter.broker.protocol.version: "3.0"
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 500Gi
deleteClaim: false
logging: #9
type: inline
loggers:
kafka.root.logger.level: "INFO"
metricsConfig:
type: jmxPrometheusExporter
valueFrom:
configMapKeyRef:
name: kafka-metrics
key: kafka-metrics-config.yml    
zookeeper:
replicas: 3
storage:
type: persistent-claim
size: 2Gi
deleteClaim: false
resources:
requests:
memory: 1Gi
cpu: "1"
limits:
memory: 2Gi
cpu: "1.5"
logging:
type: inline
loggers:
zookeeper.root.logger: "INFO"
metricsConfig:
type: jmxPrometheusExporter
valueFrom:
configMapKeyRef:
name: kafka-metrics
key: zookeeper-metrics-config.yml    
entityOperator: #11
topicOperator: {}
userOperator: {}
kafkaExporter:
topicRegex: ".*"
groupRegex: ".*"
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kafka-metrics
labels:
app: strimzi
data:
kafka-metrics-config.yml: |
# See https://github.com/prometheus/jmx_exporter for more info about JMX Prometheus Exporter metrics
lowercaseOutputName: true
rules:
# Special cases and very specific rules
- pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+), partition=(.*)><>Value
name: kafka_server_$1_$2
type: GAUGE
labels:
clientId: "$3"
topic: "$4"
partition: "$5"
- pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+), brokerPort=(.+)><>Value
name: kafka_server_$1_$2
type: GAUGE
labels:
clientId: "$3"
broker: "$4:$5"
- pattern: kafka.server<type=(.+), cipher=(.+), protocol=(.+), listener=(.+), networkProcessor=(.+)><>connections
name: kafka_server_$1_connections_tls_info
type: GAUGE
labels:
cipher: "$2"
protocol: "$3"
listener: "$4"
networkProcessor: "$5"
- pattern: kafka.server<type=(.+), clientSoftwareName=(.+), clientSoftwareVersion=(.+), listener=(.+), networkProcessor=(.+)><>connections
name: kafka_server_$1_connections_software
type: GAUGE
labels:
clientSoftwareName: "$2"
clientSoftwareVersion: "$3"
listener: "$4"
networkProcessor: "$5"
- pattern: "kafka.server<type=(.+), listener=(.+), networkProcessor=(.+)><>(.+):"
name: kafka_server_$1_$4
type: GAUGE
labels:
listener: "$2"
networkProcessor: "$3"
- pattern: kafka.server<type=(.+), listener=(.+), networkProcessor=(.+)><>(.+)
name: kafka_server_$1_$4
type: GAUGE
labels:
listener: "$2"
networkProcessor: "$3"
# Some percent metrics use MeanRate attribute
# Ex) kafka.server<type=(KafkaRequestHandlerPool), name=(RequestHandlerAvgIdlePercent)><>MeanRate
- pattern: kafka.(w+)<type=(.+), name=(.+)Percentw*><>MeanRate
name: kafka_$1_$2_$3_percent
type: GAUGE
# Generic gauges for percents
- pattern: kafka.(w+)<type=(.+), name=(.+)Percentw*><>Value
name: kafka_$1_$2_$3_percent
type: GAUGE
- pattern: kafka.(w+)<type=(.+), name=(.+)Percentw*, (.+)=(.+)><>Value
name: kafka_$1_$2_$3_percent
type: GAUGE
labels:
"$4": "$5"
# Generic per-second counters with 0-2 key/value pairs
- pattern: kafka.(w+)<type=(.+), name=(.+)PerSecw*, (.+)=(.+), (.+)=(.+)><>Count
name: kafka_$1_$2_$3_total
type: COUNTER
labels:
"$4": "$5"
"$6": "$7"
- pattern: kafka.(w+)<type=(.+), name=(.+)PerSecw*, (.+)=(.+)><>Count
name: kafka_$1_$2_$3_total
type: COUNTER
labels:
"$4": "$5"
- pattern: kafka.(w+)<type=(.+), name=(.+)PerSecw*><>Count
name: kafka_$1_$2_$3_total
type: COUNTER
# Generic gauges with 0-2 key/value pairs
- pattern: kafka.(w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Value
name: kafka_$1_$2_$3
type: GAUGE
labels:
"$4": "$5"
"$6": "$7"
- pattern: kafka.(w+)<type=(.+), name=(.+), (.+)=(.+)><>Value
name: kafka_$1_$2_$3
type: GAUGE
labels:
"$4": "$5"
- pattern: kafka.(w+)<type=(.+), name=(.+)><>Value
name: kafka_$1_$2_$3
type: GAUGE
# Emulate Prometheus 'Summary' metrics for the exported 'Histogram's.
# Note that these are missing the '_sum' metric!
- pattern: kafka.(w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Count
name: kafka_$1_$2_$3_count
type: COUNTER
labels:
"$4": "$5"
"$6": "$7"
- pattern: kafka.(w+)<type=(.+), name=(.+), (.+)=(.*), (.+)=(.+)><>(d+)thPercentile
name: kafka_$1_$2_$3
type: GAUGE
labels:
"$4": "$5"
"$6": "$7"
quantile: "0.$8"
- pattern: kafka.(w+)<type=(.+), name=(.+), (.+)=(.+)><>Count
name: kafka_$1_$2_$3_count
type: COUNTER
labels:
"$4": "$5"
- pattern: kafka.(w+)<type=(.+), name=(.+), (.+)=(.*)><>(d+)thPercentile
name: kafka_$1_$2_$3
type: GAUGE
labels:
"$4": "$5"
quantile: "0.$6"
- pattern: kafka.(w+)<type=(.+), name=(.+)><>Count
name: kafka_$1_$2_$3_count
type: COUNTER
- pattern: kafka.(w+)<type=(.+), name=(.+)><>(d+)thPercentile
name: kafka_$1_$2_$3
type: GAUGE
labels:
quantile: "0.$4"
zookeeper-metrics-config.yml: |
# See https://github.com/prometheus/jmx_exporter for more info about JMX Prometheus Exporter metrics
lowercaseOutputName: true
rules:
# replicated Zookeeper
- pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\d+)><>(\w+)"
name: "zookeeper_$2"
type: GAUGE
- pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\d+), name1=replica.(\d+)><>(\w+)"
name: "zookeeper_$3"
type: GAUGE
labels:
replicaId: "$2"
- pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\d+), name1=replica.(\d+), name2=(\w+)><>(Packets\w+)"
name: "zookeeper_$4"
type: COUNTER
labels:
replicaId: "$2"
memberType: "$3"
- pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\d+), name1=replica.(\d+), name2=(\w+)><>(\w+)"
name: "zookeeper_$4"
type: GAUGE
labels:
replicaId: "$2"
memberType: "$3"
- pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\d+), name1=replica.(\d+), name2=(\w+), name3=(\w+)><>(\w+)"
name: "zookeeper_$4_$5"
type: GAUGE
labels:
replicaId: "$2"
memberType: "$3"

kafkaUser yaml:
---------
kind: KafkaUser
metadata:
name: syslog-vani-noacl
labels:
strimzi.io/cluster: versa-kafka-gke
spec:
authentication:
type: tls
authorization:
type: simple
acls:
# Topics and groups used by the HTTP clients through the HTTP Bridge
# Change to match the topics used by your HTTP clients
- resource:
type: topic
name: syslog.ueba-us4.v1.versa.demo3
patternType: literal
operation: Read
host: "*"
- resource:
type: topic
name: syslog.ueba-us4.v1.versa.demo3
patternType: literal
operation: Describe
host: "*"
- resource:
type: topic
name: syslog.ueba-us4.v1.versa.demo3
patternType: literal
operation: Write
host: "*"
- resource:
type: group
name: 'spark-kafka-source-'
patternType: prefix
operation: Read
host: "*"
- resource:
type: group
name: 'ss.consumer'
patternType: literal
operation: Read
host: "*"
- resource:
type: group
name: 'versa-console-consumer'
patternType: literal
operation: Read
host: "*"

kafkaUser yaml中提到的消费者群体都没有出现在度量中->kafka_consumergroup_lag

有什么想法可以调试/修复吗?

tia!


请注意:我的Spark程序运行在dataproc上(即不是部署Kafka的kubernetes集群(,这会影响kafkaExporter显示消费者组滞后的方式吗?

Kafka Exporter基于__consumer_offsets主题中提交的消费者偏移量导出Prometheus度量。因此,当一些消费者连接到您的Kafka集群,消费一些消息并提交它们时,它会看到它们并在度量中显示它们。

另一方面,KafkaUserCR仅列出ACL。因此,您赋予用户使用这样一个消费者群体的权利。但这并不意味着消费者群体的存在。只有当用户使用它并提交某些内容时,它才会显示出来。

因此,你所看到的可能是完全好的和意料之中的。

最新更新