增加复制因子因子



我们在一个实时kafka集群上有大约1k个主题。目前我们在3个数据中心上有6个代理(Id为1、2、3、4、5、6)。我们在集群级别的默认复制因子设置为3。现在由于一些不可避免的情况,我们失去了一个DC(broker id 1和2)。因此,我们已经完成了分区重新分配,我们将分区重新分配给代理3、4、5和6。除此之外,为了获得更高的容错性,我们希望将所有现有主题的复制因子增加到4

下面是生成的主题分区的一个小示例。现在,这里的计划是保留现有的分区重新分配,只添加缺少的代理,例如

。my_topic_1 p0副本为[4,5,3],我希望将其更新为[4,5,3,6]

my_topic_2 p0副本为[3,6,4],我希望将其更新为[3,6,4,5]

my_topic_2 p0副本为[6,4,5],我希望将其更新为[6,4,5,3]

JSON示例如下。我一直在尝试使用grep、sed和jq的组合,这样我们就可以得到每个分区的副本列表,例如

my_topic_1 p0副本列表是[4,5,3],并将其与主列表(集群中现有的代理)进行比较;[3,4,5,6]并将缺失的broker追加到分区列表中,因此,在分区列表中没有broker 6;因此,添加6,使topic的分区列表变为[4,5,3,6]

感谢您的建议

{
"version": 1,
"partitions": [{
"topic": "my_topic_1",
"partition": 0,
"replicas": [4, 5, 3],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_2",
"partition": 0,
"replicas": [3, 6, 4],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_2",
"partition": 1,
"replicas": [6, 4, 5],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_2",
"partition": 2,
"replicas": [4, 5, 3],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_2",
"partition": 3,
"replicas": [5, 3, 6],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_2",
"partition": 4,
"replicas": [3, 5, 6],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_2",
"partition": 5,
"replicas": [6, 3, 4],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_3",
"partition": 0,
"replicas": [4, 6, 5],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_3",
"partition": 1,
"replicas": [5, 4, 3],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_3",
"partition": 2,
"replicas": [3, 5, 6],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_3",
"partition": 3,
"replicas": [6, 3, 4],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_3",
"partition": 4,
"replicas": [4, 6, 5],
"log_dirs": ["any", "any", "any"]
}, {
"topic": "my_topic_3",
"partition": 5,
"replicas": [5, 4, 3],
"log_dirs": ["any", "any", "any"]
}]
}

我不知道这意味着什么,但是要将[3,4,5,6]的缺失项添加到给定数组中,只需添加一个值18(即3+4+5+6)减去数组的当前和(add)的项:

jq '.partitions[].replicas |= . + [18-add]' file.json

演示为了使它更通用,您可以使用--argjson:

提供完整的数组作为参数:
jq --argjson full '[3,4,5,6]' '
.partitions[].replicas |= . + [($full | add) - add]
' file.json

演示或者通过创建unique值的数组来从手头的数组生成完整的列表:

jq '
([.partitions[].replicas[]] | unique | add) as $sum
| .partitions[].replicas |= . + [$sum - add]
' file.json

演示

最新更新