我正在我们的测试环境中设置Strimzi kafka Mirrormaker2,它平均每5分钟接收100,000条消息。我们有大约25个主题和900个分区用于这些主题。我设置的默认配置是每5分钟镜像60k消息到容灾集群。我正在尝试优化此配置以获得更好的吞吐量和延迟。
apiVersion: v1
items:
- apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaMirrorMaker2
spec:
clusters:
- alias: source
authentication:
certificateAndKey:
certificate: user.crt
key: user.key
secretName: mirrormaker1
type: tls
bootstrapServers: bootstrap1:443
tls:
trustedCertificates:
- certificate: ca.crt
secretName: cert-source
- alias: target
authentication:
certificateAndKey:
certificate: user.crt
key: user.key
secretName: mirrormaker-dr
type: tls
bootstrapServers: bootstrap2:443
config:
offset.flush.timeout.ms: 120000
tls:
trustedCertificates:
- certificate: ca.crt
secretName: dest-cert
connectCluster: target
livenessProbe:
initialDelaySeconds: 40
periodSeconds: 40
timeoutSeconds: 30
metricsConfig:
type: jmxPrometheusExporter
valueFrom:
configMapKeyRef:
key: mm2-metrics-config.yaml
name: mm2-metrics
mirrors:
- checkpointConnector:
config:
checkpoints.topic.replication.factor: 3
tasksMax: 10
groupsPattern: .*
heartbeatConnector:
config:
heartbeats.topic.replication.factor: 3
sourceCluster: source
sourceConnector:
config:
consumer.request.timeout.ms: 150000
offset-syncs.topic.replication.factor: 3
refresh.topics.interval.seconds: 60
replication.factor: 3
source.cluster.producer.enable.idempotence: "true"
sync.topic.acls.enabled: "true"
target.cluster.producer.enable.idempotence: "true"
tasksMax: 60
targetCluster: target
topicsPattern: .*
readinessProbe:
initialDelaySeconds: 40
periodSeconds: 40
timeoutSeconds: 30
replicas: 4
resources:
limits:
cpu: 9
memory: 30Gi
requests:
cpu: 5
memory: 15Gi
version: 2.8.0
使用上述配置,我在日志文件中没有看到任何错误。
我尝试微调配置以获得更高的吞吐量和延迟,如下所示
consumer.max.partition.fetch.bytes: 2097152
consumer.max.poll.records: 1000
consumer.receive.buffer.bytes: 131072
consumer.request.timeout.ms: 200000
consumer.send.buffer.bytes: 262144
offset-syncs.topic.replication.factor: 3
producer.acks: 0
producer.batch.size: 20000
producer.buffer.memory: 30331648
producer.linger.ms: 10
producer.max.request.size: 2097152
producer.message.max.bytes: 2097176
producer.request.timeout.ms: 150000
我现在在日志中看到以下错误,但数据仍然在流动,并且看到消息数量略微增加到约65k/5min。我还将tasksmax计数从60增加到800,并将副本从4增加到8,但我认为这样做没有任何区别。此外,N/w字节在~20 MiB/s左右。即使我进一步增加了consumer.request.timeout.ms,下面的错误仍然没有消失。
2022-04-26 04:09:51 . 1223 INFO [Consumer clientId= Consumer -null-1601, groupId=null]向节点4发送读取请求(sessionId=629190882, epoch=65)错误:(org.apache. kafca .clients. fetchsessionhandler) [task-thread-us-ashburn-1->us-phoenix-1-dr. 1]mirrorsourceconnector - 759]org.apache.kafka.common.errors.DisconnectException
我能做些什么来增加吞吐量和减少延迟吗?
我之前没有配置过Strimzi kafka Mirrormaker,但乍一看,生产者和消费者配置似乎与kafka-clients
库暴露的相同。假设是这种情况,则制作人的batch.size
(设置为20000)不是记录数。它以字节为单位,这意味着,使用此配置,生产者每次发送最多只能传输20kb。尝试将其增加到65,536(64千字节)或更高。如果吞吐量仍然没有增加,将linger.ms
增加到100
或更高,以便生产者在触发发送之前等待每个批次填充更长时间