我们有3台版本为-0.1 的Kafka机器
我们看到一些主题是-bio_test_covid9_verification不平衡,ISR不同步,如下
Topic: bio_test_covid9_verifcation Partition: 0 Leader: 1003 Replicas: 1003,1001,1002 Isr: 1003
Topic: bio_test_covid9_verifcation Partition: 1 Leader: 1003 Replicas: 1001,1002,1003 Isr: 1003
Topic: bio_test_covid9_verifcation Partition: 2 Leader: 1002 Replicas: 1002,1003,1001 Isr: 1003,1002,1001
Topic: bio_test_covid9_verifcation Partition: 3 Leader: 1003 Replicas: 1003,1002,1001 Isr: 1003
Topic: bio_test_covid9_verifcation Partition: 4 Leader: 1003 Replicas: 1001,1003,1002 Isr: 1003
Topic: bio_test_covid9_verifcation Partition: 5 Leader: 1002 Replicas: 1002,1001,1003 Isr: 1003,1002,1001
Topic: bio_test_covid9_verifcation Partition: 6 Leader: 1003 Replicas: 1003,1001,1002 Isr: 1003
Topic: bio_test_covid9_verifcation Partition: 7 Leader: 1003 Replicas: 1001,1002,1003 Isr: 1003
Topic: bio_test_covid9_verifcation Partition: 8 Leader: 1002 Replicas: 1002,1003,1001 Isr: 1003,1002,1001
Topic: bio_test_covid9_verifcation Partition: 9 Leader: 1003 Replicas: 1003,1002,1001 Isr: 1003
Topic: bio_test_covid9_verifcation Partition: 10 Leader: 1003 Replicas: 1001,1003,1002 Isr: 1003
Topic: bio_test_covid9_verifcation Partition: 11 Leader: 1002 Replicas: 1002,1001,1003 Isr: 1003,1002,1001
预期应该是这样的
Topic: bio_test_covid9_verifcation Partition: 0 Leader: 1003 Replicas: 1003,1001,1002 Isr: 1003
Topic: bio_test_covid9_verifcation Partition: 1 Leader: 1001 Replicas: 1001,1002,1003 Isr: 1003,1002,1001
Topic: bio_test_covid9_verifcation Partition: 2 Leader: 1002 Replicas: 1002,1003,1001 Isr: 1003,1002,1001
Topic: bio_test_covid9_verifcation Partition: 3 Leader: 1003 Replicas: 1003,1002,1001 Isr: 1003,1002,1001
Topic: bio_test_covid9_verifcation Partition: 4 Leader: 1001 Replicas: 1001,1003,1002 Isr: 1003,1002,1001
Topic: bio_test_covid9_verifcation Partition: 5 Leader: 1002 Replicas: 1002,1001,1003 Isr: 1003,1002,1001
Topic: bio_test_covid9_verifcation Partition: 6 Leader: 1003 Replicas: 1003,1001,1002 Isr: 1003,1002,1001
Topic: bio_test_covid9_verifcation Partition: 7 Leader: 1001 Replicas: 1001,1002,1003 Isr: 1003,1002,1001
Topic: bio_test_covid9_verifcation Partition: 8 Leader: 1002 Replicas: 1002,1003,1001 Isr: 1003,1002,1001
Topic: bio_test_covid9_verifcation Partition: 9 Leader: 1003 Replicas: 1003,1002,1001 Isr: 1003,1002,1001
Topic: bio_test_covid9_verifcation Partition: 10 Leader: 1001 Replicas: 1001,1003,1002 Isr: 1003,1002,1001
Topic: bio_test_covid9_verifcation Partition: 11 Leader: 1002 Replicas: 1002,1001,1003 Isr: 1003,1002,1001
因此,在63个主题中,总共只有2个主题出现了ISR不同步和领导者不平衡的糟糕状态
因此,为了研究这里发生的事情,我们首先验证了一些Kafka参数,以便了解同步问题是否与不正确的Kafka参数有关我在这里给出了一些配置的参数
在我们的Kafka配置中,我们有例如
`min.insync.replicas=1` ( while replication.factor is 3 )
正如我所知,min.insync.replicas
是代理上的一个配置,它表示代理允许acks=所有
所以我想问一下min.insync.replicas=1
的这种设置是否是问题的一部分?如果我们设置min.insync.replicas=3
,那么它将避免同步ISR问题
你怎么想?
"同步问题";类似于宕机的副本并不是代理配置不好的症状,相反,它表明您有其他网络或硬件故障,导致代理变得不健康
您绝对应该维护min.insync.replicas=2
,以确保丢失一个代理不会丢失数据。将复制因子设置为3并不是的最佳实践
Cloudera文档中对此进行了说明。