如果协调员完全失去,如何在Citus中将元数据同步的工作人员提升为协调员



我有一个citus集群,有1个协调器和3个工作者。

最近,托管协调器的服务器完全关闭,无法恢复。

现在集群只做Query和Some DML,而不做DDL,这只能在协调器上执行!

那么,我该如何将元数据同步的工作人员提升为协调员呢?

首先,我们目前还没有针对此类场景的可接受的解决方案。通常,这必须通过使用备份恢复协调器来解决。强烈建议将数据库设置为定期备份,并能够在发生此类不可恢复的崩溃时进行恢复。

此解决方案不提升工作人员,而是建立一个新的协调人。如果将其应用于自己的集群,可能会出现复杂情况(数据库可能会崩溃(,因为这涉及到扰乱citus的元数据。此解决方案是实验性的,因此我强烈建议尝试将其应用于集群的分支或快照磁盘,以确保不会对数据造成损坏。

  • 此解决方案已在包含分布式表和引用表的本地集群中针对citus 11在有限范围内进行了测试。如果数据库中有其他分布式对象,如视图或排序规则,则此解决方案可能会失败(尚未测试(。

  • 在整个过程中,您需要使用SELECT * FROM pg_dist_node;查找groupid-s和nodeid-s

  • 如果旧协调器在pg_dist_node表中,则需要包括/排除某些步骤。如果旧的协调器在pg_dist_table中,我将用[if-CIM include]标记需要包括的步骤,用[if-CIM-exclude]标记需要排除的步骤。(CIM=元数据中的协调器(

  • 创建一个新节点并安装citus。

  • [if CIM include]在除您刚刚创建的新节点之外的所有节点中运行:

-- removes old coordinator from citus metadata
DELETE FROM pg_dist_node WHERE groupid = 0;
-- removes old coordinator placements from citus metadata
DELETE FROM pg_dist_placement WHERE groupid = 0;
  • 连接到组1中的节点(在pg_dist_node中查找节点(:
-- temporarily mark this node as the coordinator in the metadata
UPDATE pg_dist_local_group SET groupid = 0;
UPDATE pg_dist_node SET groupid = 0 WHERE groupid = 1;
UPDATE pg_dist_placement SET groupid = 0 WHERE groupid = 1;
-- adjust the metadata as if we are ready to add new node
SELECT max(groupid) as groupid FROM pg_dist_node gset
SELECT setval('pg_dist_groupid_seq', :groupid, true);
SELECT max(nodeid) as nodeid FROM pg_dist_node gset
SELECT setval('pg_dist_node_nodeid_seq', :nodeid, true);
SELECT max(placementid) as placementid FROM pg_dist_placement gset
SELECT setval('pg_dist_placement_placementid_seq', :placementid, true);
-- add the new node
SELECT citus_add_node('NEW_NODE_HOST', NEW_NODE_PORT);
-- set back the original metadata
UPDATE pg_dist_local_group SET groupid = 1;
UPDATE pg_dist_node SET groupid = 1 WHERE groupid = 0;
UPDATE pg_dist_placement SET groupid = 1 WHERE groupid = 0;
-- look up your new node's group id
-- this value is important and I will refer to it in future steps as NEW_NODE_GROUP_ID
SELECT * FROM pg_dist_node;
  • 连接并在新节点中运行:
-- restore the metadata of your temporary coodinator
UPDATE pg_dist_node SET groupid = 1 WHERE groupid = 0;
UPDATE pg_dist_placement SET groupid = 1 WHERE groupid = 0;
-- set the new node as the coordinator
UPDATE pg_dist_local_group SET groupid = 0;
-- set the metadata for the new coordinator
SELECT max(groupid) as groupid FROM pg_dist_node gset
SELECT setval('pg_dist_groupid_seq', :groupid, true);
SELECT max(nodeid) as nodeid FROM pg_dist_node gset
SELECT setval('pg_dist_node_nodeid_seq', :nodeid, true);
SELECT max(placementid) as placementid FROM pg_dist_placement gset
SELECT setval('pg_dist_placement_placementid_seq', :placementid, true);
SELECT max(shardid) as shardid FROM pg_dist_shard gset
SELECT setval('pg_dist_shardid_seq', :shardid, true);
  • [if CIM exclude]在新节点中运行:
-- if the coordinator is not in the metadata, all shards in the new node which will be the coordinator
-- need to be dropped
SET citus.enable_manual_changes_to_shards TO true;
DO $$
DECLARE
    row record;
BEGIN
    FOR row IN 
    SELECT CONCAT(logicalrelid, '_', shardid) AS shard_name 
    FROM pg_dist_placement NATURAL JOIN pg_dist_shard 
    WHERE groupid = NEW_NODE_GROUP_ID
    LOOP
        EXECUTE 'DROP TABLE ' || quote_ident(row.shard_name);
        RAISE INFO 'Dropped shard: %', quote_ident(row.shard_name);
    END LOOP;
END;
$$;
RESET citus.enable_manual_changes_to_shards;
  • 在所有节点中运行:
-- if coordinator is not in the metadata remove the new node from the pg_dist_node
-- and remove its shard placements from the metadata
DELETE FROM pg_dist_node WHERE groupid = NEW_NODE_GROUP_ID; -- [if CIM exclude]
DELETE FROM pg_dist_placement WHERE groupid = NEW_NODE_GROUP_ID; -- [if CIM exclude]
-- if the coordinator is in the metadata set the new node as coordinator, set shouldhaveshars
-- for the coordinator to False, and update the coordinator placement group ids
UPDATE pg_dist_node SET groupid = 0, shouldhaveshards = False WHERE groupid = NEW_NODE_GROUP_ID; -- [if CIM include]
UPDATE pg_dist_placement SET groupid = 0 WHERE groupid = NEW_NODE_GROUP_ID; -- [if CIM include]

应该就是这样,您的新节点现在就是协调器。祝你好运:(

最新更新