我在启用istio后遇到了hazelcast问题,我想知道如何解决这个问题。
我有K8s集群,我最近安装了istio。当尝试使用hazelcast将envoy添加到部署时,我有一个奇怪的问题,在滚动升级期间我有许多连接错误。我应该提一下,最终部署是OK的,但这个错误表明出了问题。
我注意到,当我在做滚动升级到部署时,没有Envoy,我看到以下内容:
[10.16.17.72]:5701 [dev] [4.0.1] Initialized new cluster connection between /10.16.17.72:45025 and /10.16.5.8:5701
[10.16.5.8]:5701 [dev] [4.0.1] Initialized new cluster connection between /10.16.5.8:5701 and /10.16.17.72:45025
[10.16.17.72]:5701 [dev] [4.0.1] Connection[id=1, /10.16.17.72:45025->/10.16.5.8:5701, qualifier=null, endpoint=[10.16.5.8]:5701, alive=false, connectionType=MEMBER] closed. Reason: Connection closed by the other side
[10.16.17.72]:5701 [dev] [4.0.1] Could not connect to: /10.16.5.8:5701. Reason: SocketException[Connection refused to address /10.16.5.8:5701]
......
[10.16.17.72]:5701 [dev] [4.0.1] Removing connection to endpoint [10.16.5.8]:5701 Cause => java.net.SocketException {Connection refused to address /10.16.5.8:5701}, Error-Count: 5
[10.16.17.72]:5701 [dev] [4.0.1] Member [10.16.5.8]:5701 - 945ec2c8-fc56-4624-aab3-de9823d4886a is suspected to be dead for reason: No connection
这里的情况是:
- 新pod启动并加入集群。
- 初始化旧pod:5701到新pod:xxx的连接(2个方向)
- 新pod抱怨它无法到达旧pod (connectionType=MEMBER),并且经过5次尝试认为它已经死亡,并将其从集群中删除
- 滚动升级完成后旧pod被移除。
现在,当我在注射特使时做同样的事情,所以我在这个pod部署中有2个容器,我注意到以下内容:
[10.16.3.244]:5701 [dev] [4.0.1] Initialized new cluster connection between /10.16.3.244:5701 and **/127.0.0.6:48287**
[10.16.5.16]:5701 [dev] [4.0.1] Initialized new cluster connection between /10.16.5.16:59827 and /10.16.3.244:5701
[10.16.5.16]:5701 [dev] [4.0.1] Connection[id=1, /10.16.5.16:59827->/10.16.3.244:5701, qualifier=null, endpoint=[10.16.3.244]:5701, alive=false, connectionType=MEMBER] closed. Reason: Connection closed by the other side
,但随后我收到了数百万条这样的消息:
[10.16.5.16]:5701 [dev] [4.0.1] Connection[id=2, /10.16.5.16:33659->/10.16.3.244:5701, qualifier=null, endpoint=[10.16.3.244]:5701, alive=false, connectionType=NONE] closed. Reason: Connection closed by the other side
第一个'Connection closed'消息是MEMBER类型的,并且与我们在初始化消息中提到的连接相同(10.16.5.16:59827—>10.16.3.244:5701)
,但其余的消息是从10.16.5.16上的随机端口到旧pod的。我认为这样做的原因是init消息表明初始化新的集群连接之间
/10.16.3.244:5701 and **/127.0.0.6:48287**
配置到错误的主机:端口而不是10.16.5.16:59827.
滚动升级完成相同的操作,但日志中充满了来自该类型的数百万条消息。
我已经复制了相同的,并看到了不匹配的端口:
I 2021-02-09T08:18:40.163Z [10.16.5.22]:5701 [dev] [4.0.1] Initialized new cluster connection between /10.16.5.22:49213 and /10.16.5.21:5701
I 2021-02-09T08:18:40.158Z [10.16.5.21]:5701 [dev] [4.0.1] Initialized new cluster connection between /10.16.5.21:5701 and /10.16.5.22:46274
我该如何预防?为什么获得remoteAddress的这一部分得到不同的端口?到目前为止,我尝试了以下操作:
- 升级到4.1.1,没有变化
- 添加JHIPSTER_SLEEP,没有帮助
- 更新了部署,以确保代理在应用程序之前启动,没有帮助。
任何想法?
谢谢陈
Hazelcast的滚动升级是一项企业功能,您需要一个许可密钥。如果你已经有一个,那么你应该检查这个与Hazelcast支持。