Mongo removeShard卡在排水和其他碎片不可访问



我有一个由以下碎片组成的3个碎片集群:

  • bp-rs0
  • bp-rs1
  • bp-rs3

我想删除1个shard;bp-rs3 .

我执行了db.adminCommand( { removeShard: "bp-rs3" } )并得到了我所期望的,典型的确认。

它说我需要删除或移动一个不再需要的数据库,所以我删除了它。我不确定这是否导致了我的问题:

几个小时以来,运行db.adminCommand( { removeShard: "bp-rs3" } )返回的排水消息完全如下所示:

{
"msg" : "draining ongoing",
"state" : "ongoing",
"remaining" : {
"chunks" : 334,
"dbs" : 0
},
"note" : "you need to drop or movePrimary these databases",
"dbsToMove" : [ ],
"ok" : 1,
"operationTime" : Timestamp(1629235413, 2),
"$clusterTime" : {
"clusterTime" : Timestamp(1629235413, 2),
"signature" : {
"hash" : BinData(0,"IkfHFSkxh7gQheeWlXsI/tTjU1U="),
"keyId" : 6978594490403520515
}
}
}

注意剩余的334块. 好久没变了

这不是一个太大的问题,但是我最常用的集合现在是不可查询的,这意味着它所服务的应用程序是不可用的。

当试图查询我唯一的分区集合时,我得到以下错误:

{
"message" : "Encountered non-retryable error during query :: caused by :: Could not find host matching read preference { mode: 'primary' } for set bp-rs1",
"ok" : 0,
"code" : 133,
"codeName" : "FailedToSatisfyReadPreference",
"operationTime" : "Timestamp(1629232940, 1)",
"$clusterTime" : {
"clusterTime" : "Timestamp(1629232944, 2)",
"signature" : {
"hash" : "IlYQ/HU+EWYsm8CL2xtCziX6xtY=",
"keyId" : "6978594490403520515"
}
},
"name" : "MongoError"
}

我不知道为什么bp-rs1会受到影响。Bp-rs0是主节点

sh.status返回以下内容:

--- Sharding Status --- 
sharding version: {
"_id" : NumberInt(1),
"minCompatibleVersion" : NumberInt(5),
"currentVersion" : NumberInt(6),
"clusterId" : ObjectId("602d2def7771e35f1961e454")
}
shards:
{  "_id" : "bp-rs0",  "host" : "bp-rs0/xxx:27020,xxx:27020",  "state" : NumberInt(1) }
{  "_id" : "bp-rs1",  "host" : "bp-rs1/xxx:27020",  "state" : NumberInt(1) }
{  "_id" : "bp-rs3",  "host" : "bp-rs3/xxx:27020",  "state" : NumberInt(1),  "draining" : true }
active mongoses:
"4.0.3" : 1
autosplit:
Currently enabled: yes
balancer:
Currently enabled:  yes
Currently running:  yes
Failed balancer rounds in last 5 attempts:  5
Last reported error:  Could not find host matching read preference { mode: "primary" } for set bp-rs1
Time of Reported error:  Tue Aug 17 2021 23:09:45 GMT+0100 (British Summer Time)
Migration Results for the last 24 hours: 
241 : Success
1 : Failed with error 'aborted', from bp-rs3 to bp-rs1
databases:
{  "_id" : "xxx",  "primary" : "bp-rs0",  "partitioned" : true,  "version" : {  "uuid" : UUID("c6301dba-1f34-4043-be6f-1e99dc9a8fb9"),  "lastMod" : NumberInt(1) } }
xxx.listings
shard key: { "meta.canonical" : 1 }
unique: false
balancing: true
chunks:
bp-rs0  696
bp-rs1  695
bp-rs3  334
too many chunks to print, use verbose if you want to force print
{  "_id" : "config",  "primary" : "config",  "partitioned" : true }
config.system.sessions
shard key: { "_id" : NumberInt(1) }
unique: false
balancing: true
chunks:
bp-rs0  1
{ "_id" : MinKey } -->> { "_id" : MaxKey } on : bp-rs0 Timestamp(1, 0) 

我能做点什么吗?是回滚并重新开始,还是只是让一切正常工作?

Thanks in advance

我连接到bp-rs2,发现服务由于某种原因崩溃了。我再次启动它,迁移完成了我所期望的。

我不知道确切的原因,但可能是因为我在数据流失的时候掉了数据库。

最新更新