如何解决 Redis 集群"Waiting for the cluster to join"问题？

我有3台机器，为redis集群创建了6个节点，我几个月前已经成功创建了它，但现在它掉了，我尽力修复它，但它不起作用，所以我清理了所有数据，并从零开始重新创建它，当我使用下面的命令创建集群时，它在这里阻塞，等待节点加入集群，我为它做了一些研究，我清理了我的数据，一次又一次地记录，一次又一次地做，但仍然不起作用。

redis-trib.rb create --replicas 1 10.2.1.208:6379 10.2.1.208:6380 10.2.1.209:6379 10.2.1.209:6380 10.2.1.15:6379 10.2.1.15:6380

显示结果

redis-trib.rb create --replicas 1 10.2.1.208:6379 10.2.1.208:6380 10.2.1.209:6379 10.2.1.209:6380 10.2.1.15:6379 10.2.1.15:6380
>>> Creating cluster
>>> Performing hash slots allocation on 6 nodes...
Using 3 masters:
10.2.1.208:6379
10.2.1.209:6379
10.2.1.15:6379
Adding replica 10.2.1.209:6380 to 10.2.1.208:6379
Adding replica 10.2.1.208:6380 to 10.2.1.209:6379
Adding replica 10.2.1.15:6380 to 10.2.1.15:6379
M: 73b3b99bb17de63aa99eaf592376f0a06feb3d66 10.2.1.208:6379
   slots:0-5460 (5461 slots) master
S: 05b33ed6691797faaf7ccec1541396472b9d2866 10.2.1.208:6380
   replicates f14702ebb1462b313dd7eb4809ec50e30e4eef36
M: f14702ebb1462b313dd7eb4809ec50e30e4eef36 10.2.1.209:6379
   slots:5461-10922 (5462 slots) master
S: 3a9f433a8503281b0ddfc6ec69016908735053b8 10.2.1.209:6380
   replicates 73b3b99bb17de63aa99eaf592376f0a06feb3d66
M: 2fd97e8842828dba6b425b6a30e764fb06915737 10.2.1.15:6379
   slots:10923-16383 (5461 slots) master
S: c46db592d49bc1e9d8b5efb27b9799929c5186a4 10.2.1.15:6380
   replicates 2fd97e8842828dba6b425b6a30e764fb06915737
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join...........................................................................^C/usr/local/bin/redis-trib.rb:652:in `sleep': Interrupt
        from /usr/local/bin/redis-trib.rb:652:in `wait_cluster_join'
        from /usr/local/bin/redis-trib.rb:1305:in `create_cluster_cmd'
        from /usr/local/bin/redis-trib.rb:1695:in `<main>'

来自Redis官方网站上的集群教程：

每个Redis Cluster节点都需要打开两个TCP连接。正常Redis TCP端口用于为客户端提供服务，例如6379，加上端口通过将10000添加到数据端口而获得，因此在该示例中为16379。

第二个高端口用于集群总线，即使用二进制协议的节点到节点通信信道。这个节点使用集群总线进行故障检测、配置更新、故障转移授权等等。客户永远不应该尝试与群集总线端口通信，但始终与正常Redis命令端口，但是请确保在防火墙，否则Redis群集节点将无法交流

命令端口和集群总线端口偏移是固定的，并且始终是10000.

我使用了AWS，但没有打开导致此问题的端口16379和16380。

如果这6个节点之间没有防火墙问题，您可以检查redis.conf中的bind设置。

当然，您应该在LAN IP上绑定redis服务，但还有一件事：

删除127.0.0.1或将127.0.0.1移动到LAN IP后的末尾

就像这样：bind 10.2.1.x 127.0.0.1或bind 10.2.1.x

我在3台服务器上的3个节点之间创建集群时遇到了这个问题，正在等待集群永远加入。当你把127.0.0.1放在LAN IP前面时，这可能是redis中的一个错误，至少在redis 5.0中是这样。

如果使用127.0.0.1而不是IP地址作为主机名，您也可能会看到此问题。在这种情况下，您需要将其更改为使用ip地址作为主机名。https://stackoverflow.com/a/36080707/5159284

这两个答案对我都不起作用，但我发现以下博客对我有帮助：

https://linux.m2osw.com/redis-infamous-waiting-cluster-join-message

问题是因为我创建了一个原始服务器，并将其克隆以生成另外两个节点。克隆的节点使用相同的节点Id，redis不喜欢这样。

解决方案是停止redis服务器，然后删除nodes.conf文件，该文件的实际名称在redis.conf文件中定义。我的名字实际上叫做nodes-6379.conf。然后重新启动redis服务器。在所有节点上执行此操作。

这可能是因为使用预安装的Redis复制虚拟机/容器。当你安装Redis时，它会使用预先创建的ID创建配置，并显示相同的配置：

# cat /etc/redis/redis.conf
...
cluster-config-file nodes-6379.conf
...

如果是这样的话，你可以检查并删除：

# rm /var/lib/redis/nodes-6379.conf

并且在重新启动redis之后。这些路径适用于Ubuntu。链接到帮助我的帖子https://linux.m2osw.com/redis-infamous-waiting-cluster-join-message.

将redis.config中的cluster-announce-ip设置为与bind相同的值。

我在这里尝试了所有可能的建议，但这最终在我的kubernetes/docker设置中对我有效。

事实上，在redis.conf部分，我发现：

########################## CLUSTER DOCKER/NAT support  ########################
> # In certain deployments, Redis Cluster nodes address discovery fails, because
> # addresses are NAT-ted or because ports are forwarded (the typical case is
> # Docker and other containers).
> #
> # In order to make Redis Cluster working in such environments, a static
> # configuration where each node knows its public address is needed. The
> # following four options are used for this scope, and are:
> #
> # * cluster-announce-ip ...

相关内容

最新更新

热门标签：