成员重新连接后,JGroups GMS Join失败



我使用JGroups 4.0.15.Final来实现集群应用程序。当我启动这两个节点时,一切都很好,它们连接起来并开始工作。但当我杀死/重新启动一个节点时,它们就不会再连接在一起了。

注意:它们是我的应用程序中的一个专长,因为我使用TCP协议的自定义套接字工厂。这家工厂创建了一些有特殊要求的加密套接字。但连接似乎很好,插座也能正常工作。我唯一注意到的是,在NODE_2被验证为死后,它们仍然是TCP协议在其套接字上的连接尝试。

我试了几个小时才找到问题,但就是找不到。有时节点在尝试了几次连接后再次连接,但这可能需要几分钟到一个多小时。我认为这与转世问题有关,但我不确定。

我目前正在分析TCP协议的TRACE日志,但仍然无法指出两个节点需要几分钟才能再次合并的原因。

当我重新启动NODE_2时,同时两个(NODE_1=1.1.1.1NODE_2=1.1.1.1.2)都已连接,并且一切正常。我看到双方都建立了联系:

NODE_1:

2019-01-17 09:51:55.089 [TQ-Bundl |                  ] TRACE TCP                      - 1.1.1.1:7800: failed connecting to 1.1.1.2:7800: java.net.SocketException: Failed to connect to '/1.1.1.2:7800' with Socket #1041583140. ConnectException: Connection refused (Connection refused)
2019-01-17 09:51:55.089 [TQ-Bundl |                  ] TRACE TCP                      - 1.1.1.1:7800: removed connection to 1.1.1.2:7800 | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.089 [TQ-Bundl |                  ] TRACE TCP                      - JGRP000029: NODE_1 (flags=1): failed sending message to 1.1.1.2:7800 (117 bytes): java.net.SocketException: Failed to connect to '/1.1.1.2:7800' with Socket #1041583140. ConnectException: Connection refused (Connection refused), headers: TCPPING: [GET_MBRS_REQ cluster=CLUSTER initial_discovery=false], TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
----------------------------
NODE_2 is comes back online
----------------------------
2019-01-17 09:51:55.791 [TcpServe |                  ] TRACE TCP                      - 1.1.1.1:7800: accepted connection from 1.1.1.2:7800 | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.798 [jgrp-29, |                  ] TRACE TCP                      - NODE_1 (flags=1): received [dst: 1.1.1.1:7800, src: NODE_2 (2 headers), size=50 bytes, flags=OOB|DONT_BUNDLE|INTERNAL], headers are TCPPING: [GET_MBRS_REQ cluster=CLUSTER initial_discovery=true], TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.798 [jgrp-29, |                  ] TRACE TCP                      - NODE_1 (flags=1): sending msg to 1.1.1.2:7800, src=NODE_1 (flags=1), headers are MERGE3: INFO: view_id=[NODE_1 (flags=1)|2], logical_name=NODE_1, physical_addr=1.1.1.1:7800, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.798 [jgrp-29, |                  ] TRACE TCPPING                  - NODE_1 (flags=1): received GET_MBRS_REQ from NODE_2, sending response NODE_1 (flags=1), name=NODE_1, addr=1.1.1.1:7800, coord | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.799 [jgrp-29, |                  ] TRACE TCP                      - NODE_1 (flags=1): sending msg to NODE_2, src=NODE_1 (flags=1), headers are TCPPING: [GET_MBRS_RSP cluster=null initial_discovery=false], TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.852 [jgrp-29, |                  ] TRACE TCP                      - NODE_1 (flags=1): received [dst: NODE_1 (flags=1), src: NODE_2 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NODE_2, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.853 [jgrp-29, |                  ] TRACE UNICAST3                 - NODE_1 (flags=1) <-- DATA(NODE_2: #1, conn_id=0, first) | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:56.207 [jgrp-29, |                  ] TRACE UNICAST3                 - NODE_1 (flags=1) --> ACK(NODE_2: #6) | (Log4J2LogImpl.java:56)  

节点2:

2019-01-17 09:51:48.998 [cc-clust |                  ] DEBUG Configurator             - set property TCP.bind_addr to default value /fe80:0:0:0:250:56ff:feb4:3785%eth2 | (Log4J2LogImpl.java:66) 
2019-01-17 09:51:49.000 [cc-clust |                  ] DEBUG Configurator             - set property TCP.diagnostics_addr to default value /ff0e:0:0:0:0:0:75:75 | (Log4J2LogImpl.java:66) 
2019-01-17 09:51:49.004 [cc-clust |                  ] DEBUG TCP                      - thread pool min/max/keep-alive: 0/20/3000 use_fork_join=false, internal pool: 0/4/30000 (2 cores available) | (Log4J2LogImpl.java:71) 
2019-01-17 09:51:49.011 [cc-clust |                  ] TRACE NAKACK2                  - null: set max_xmit_req_size from 0 to 511600 | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:49.012 [cc-clust |                  ] TRACE UNICAST3                 - null: set max_xmit_req_size from 0 to 511600 | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.535 [cc-clust |                  ] TRACE STABLE                   - NODE_2: stable task started | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.553 [cc-clust |                  ] TRACE TCP                      - joined /ff0e:0:0:0:0:0:75:75:7500 on eth2 | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.554 [cc-clust |                  ] TRACE TCP                      - joined /ff0e:0:0:0:0:0:75:75:7500 on eth1 | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.555 [cc-clust |                  ] TRACE TCP                      - joined /ff0e:0:0:0:0:0:75:75:7500 on eth0 | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.555 [cc-clust |                  ] TRACE TCP                      - joined /ff0e:0:0:0:0:0:75:75:7500 on lo | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.581 [cc-clust |                  ] TRACE TCPPING                  - NODE_2: sending discovery request to 1.1.1.1:7800 | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.582 [cc-clust |                  ] TRACE TCP                      - NODE_2: sending msg to 1.1.1.1:7800, src=NODE_2, headers are TCPPING: [GET_MBRS_REQ cluster=CLUSTER initial_discovery=true], TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.591 [TQ-Bundl |                  ] TRACE TCP                      - 1.1.1.2:7800: connecting to 1.1.1.1:7800 | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.823 [jgrp-1,s |                  ] TRACE TCP                      - NODE_2: received [dst: NODE_2, src: 0c8f19dc-b0da-7fbd-3dae-993890717623 (flags=1) (2 headers), size=52 bytes, flags=OOB|DONT_BUNDLE|INTERNAL], headers are TCPPING: [GET_MBRS_RSP cluster=null initial_discovery=false], TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.826 [jgrp-2,s |                  ] TRACE TCP                      - NODE_2: received [dst: 1.1.1.2:7800, src: 0c8f19dc-b0da-7fbd-3dae-993890717623 (flags=1) (2 headers), size=0 bytes, flags=INTERNAL], headers are MERGE3: INFO: view_id=[0c8f19dc-b0da-7fbd-3dae-993890717623 (flags=1)|2], logical_name=NODE_1, physical_addr=1.1.1.1:7800, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.830 [jgrp-1,s |                  ] TRACE TCPPING                  - NODE_2: received GET_MBRS_RSP from NODE_1 (flags=1): NODE_1 (flags=1), name=NODE_1, addr=1.1.1.1:7800, coord | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.832 [cc-clust |                  ] TRACE GMS                      - NODE_2: discovery took 262 ms, members: 1 rsps (1 coords) [done] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.835 [cc-clust |                  ] DEBUG GMS                      - NODE_2: sending JOIN(NODE_2) to NODE_1 (flags=1) | (Log4J2LogImpl.java:71) 
2019-01-17 09:51:55.844 [cc-clust |                  ] TRACE UNICAST3                 - NODE_2: created sender window for NODE_1 (flags=1) (conn-id=0) | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.848 [cc-clust |                  ] TRACE UNICAST3                 - NODE_2 --> DATA(NODE_1 (flags=1): #1, conn_id=0, first) | (Log4J2LogImpl.java:46) 
2019-01-17 09:51:55.848 [cc-clust |                  ] TRACE TCP                      - NODE_2: sending msg to NODE_1 (flags=1), src=NODE_2, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NODE_2, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:56.210 [jgrp-1,s |                  ] TRACE TCP                      - NODE_2: received [dst: NODE_2, src: NODE_1 (flags=1) (2 headers), size=0 bytes, flags=INTERNAL], headers are UNICAST3: ACK, seqno=6, ts=6, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:56.211 [jgrp-1,s |                  ] TRACE UNICAST3                 - NODE_2 <-- ACK(NODE_1 (flags=1): #6, conn-id=0, ts=6) | (Log4J2LogImpl.java:56) 

在这之后,两者似乎都在一个循环中处理一些事情。当NODE_2试图加入(向NODE_1发送join(NODE_2))时,NODE_1似乎没有注意到,我也没有看到NODE_1的LOG中涉及GMS协议。我还看到很多消息一遍又一遍地要求第一个序列号(*UNICAST3 NODE_2-->SEND_first_SEQNO(NODE_1(flags=1))|(Log4J2LogImpl.java:56)*)。这种情况持续了一段时间(时间从几秒钟到有时几个小时不等),直到加入/合并突然生效:

NODE_1:

2019-01-17 10:11:36.043 [jgrp-125 |                  ] TRACE TCP                      - NODE_1 (flags=1): sending msg to NODE_2, src=NODE_1 (flags=1), headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::21, UNICAST3: DATA, seqno=19750, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.544 [jgrp-125 |                  ] TRACE UNICAST3                 - NODE_1 (flags=1) --> XMIT(NODE_2: #19750) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.544 [jgrp-125 |                  ] TRACE TCP                      - NODE_1 (flags=1): sending msg to NODE_2, src=NODE_1 (flags=1), headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::21, UNICAST3: DATA, seqno=19750, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.547 [jgrp-125 |                  ] TRACE TCP                      - NODE_1 (flags=1): received [dst: NODE_1 (flags=1), src: NODE_2 (2 headers), size=0 bytes, flags=OOB], headers are UNICAST3: SEND_FIRST_SEQNO, seqno=0, ts=581, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.547 [jgrp-125 |                  ] TRACE UNICAST3                 - NODE_1 (flags=1) <-- SEND_FIRST_SEQNO(NODE_2) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.547 [jgrp-125 |                  ] TRACE TCP                      - NODE_1 (flags=1): sending msg to NODE_2, src=NODE_1 (flags=1), headers are UNICAST3: DATA, seqno=19725, first, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.716 [jgrp-125 |                  ] TRACE TCP                      - NODE_1 (flags=1): received [dst: NODE_1 (flags=1), src: NODE_2 (2 headers), size=0 bytes, flags=INTERNAL], headers are UNICAST3: ACK, seqno=19725, ts=582, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.716 [jgrp-125 |                  ] TRACE UNICAST3                 - NODE_1 (flags=1) <-- ACK(NODE_2: #19725, conn-id=0, ts=582) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.296 [jgrp-125 |                  ] TRACE TCP                      - NODE_1 (flags=1): received [dst: NODE_1 (flags=1), src: NODE_2 (2 headers), size=0 bytes, flags=INTERNAL], headers are MERGE3: VIEW_REQ: , logical_name=null, physical_addr=null, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.296 [jgrp-125 |                  ] TRACE TCP                      - NODE_1 (flags=1): sending msg to NODE_2, src=NODE_1 (flags=1), headers are MERGE3: VIEW_RSP: , logical_name=null, physical_addr=null, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.300 [jgrp-125 |                  ] TRACE TCP                      - NODE_1 (flags=1): received [dst: NODE_1 (flags=1), src: NODE_2 (3 headers), size=23 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[MERGE_REQ]: merge_id=NODE_2::22, UNICAST3: DATA, seqno=32, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.300 [jgrp-125 |                  ] TRACE UNICAST3                 - NODE_1 (flags=1) <-- DATA(NODE_2: #32, conn_id=0) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.301 [jgrp-125 |                  ] TRACE UNICAST3                 - NODE_1 (flags=1): delivering NODE_2#32 | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.301 [jgrp-125 |                  ] DEBUG STABLE                   - suspending message garbage collection | (Log4J2LogImpl.java:66) 
2019-01-17 10:11:37.302 [jgrp-125 |                  ] DEBUG STABLE                   - NODE_1 (flags=1): resume task started, max_suspend_time=220000 | (Log4J2LogImpl.java:71) 
2019-01-17 10:11:37.302 [jgrp-125 |                  ] TRACE GMS                      - NODE_1 (flags=1): got merge request from NODE_2, merge_id=NODE_2::22, mbrs=[NODE_1 (flags=1)] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.303 [jgrp-125 |                  ] TRACE UNICAST3                 - NODE_1 (flags=1) --> DATA(NODE_2: #19751, conn_id=0) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:37.303 [jgrp-125 |                  ] TRACE TCP                      - NODE_1 (flags=1): sending msg to NODE_2, src=NODE_1 (flags=1), headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::22, UNICAST3: DATA, seqno=19751, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.307 [jgrp-127 |                  ] TRACE TCP                      - NODE_1 (flags=1): received message batch of 1 messages from NODE_2 | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.310 [jgrp-127 |                  ] TRACE UNICAST3                 - NODE_1 (flags=1) <-- DATA(NODE_2: #33 - #33) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.310 [jgrp-127 |                  ] TRACE UNICAST3                 - NODE_1 (flags=1): delivering #33 - #33 (1 messages) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:37.313 [jgrp-127 |                  ] TRACE GMS                      - NODE_1 (flags=1): mcasting view MergeView::[NODE_1 (flags=1)|25] (2) [NODE_1 (flags=1), NODE_2], 2 subgroups: [NODE_1 (flags=1)|3] (2) [NODE_1 (flags=1), NODE_2], [NODE_2|24] (1) [NODE_2] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.314 [jgrp-127 |                  ] DEBUG NAKACK2                  - 
[NODE_1 (flags=1) mergeDigest()]
existing digest:  NODE_1 (flags=1): [3 (3)], NODE_2: [0 (0)]
new digest:       NODE_1 (flags=1): [3 (3)], NODE_2: [21 (21)]
resulting digest: NODE_1 (flags=1): [3 (3)], NODE_2: [21 (21)] | (Log4J2LogImpl.java:66)
2019-01-17 10:11:37.314 [jgrp-127 |                  ] DEBUG GMS                      - NODE_1 (flags=1): installing view MergeView::[NODE_1 (flags=1)|25] (2) [NODE_1 (flags=1), NODE_2], 2 subgroups: [NODE_1 (flags=1)|3] (2) [NODE_1 (flags=1), NODE_2], [NODE_2|24] (1) [NODE_2] | (Log4J2LogImpl.java:71) 

节点2:

2019-01-17 10:11:36.547 [jgrp-174 |                  ] TRACE UNICAST3                 - NODE_2 --> SEND_FIRST_SEQNO(NODE_1 (flags=1)) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.547 [jgrp-174 |                  ] TRACE TCP                      - NODE_2: sending msg to NODE_1 (flags=1), src=NODE_2, headers are UNICAST3: SEND_FIRST_SEQNO, seqno=0, ts=581, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.550 [jgrp-174 |                  ] TRACE TCP                      - NODE_2: received message batch of 1 messages from NODE_1 (flags=1) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.550 [jgrp-174 |                  ] TRACE UNICAST3                 - NODE_2: created receiver window for NODE_1 (flags=1) at seqno=#19725 for conn-id=0 | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.553 [jgrp-174 |                  ] TRACE UNICAST3                 - NODE_2 <-- DATA(NODE_1 (flags=1): #19725 - #19725) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.553 [jgrp-174 |                  ] TRACE UNICAST3                 - NODE_2: delivering #19725 - #19725 (1 messages) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:36.713 [jgrp-174 |                  ] TRACE UNICAST3                 - NODE_2 --> ACK(NODE_1 (flags=1): #19725) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.713 [jgrp-174 |                  ] TRACE TCP                      - NODE_2: sending msg to NODE_1 (flags=1), src=NODE_2, headers are UNICAST3: ACK, seqno=19725, ts=582, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.296 [jgrp-174 |                  ] TRACE MERGE3                   - NODE_2: merge participants are [NODE_2, NODE_1 (flags=1)] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.296 [jgrp-174 |                  ] TRACE TCP                      - NODE_2: sending msg to NODE_1 (flags=1), src=NODE_2, headers are MERGE3: VIEW_REQ: , logical_name=null, physical_addr=null, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.299 [jgrp-177 |                  ] TRACE TCP                      - NODE_2: received [dst: NODE_2, src: NODE_1 (flags=1) (2 headers), size=67 bytes, flags=INTERNAL], headers are MERGE3: VIEW_RSP: , logical_name=null, physical_addr=null, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.299 [jgrp-174 |                  ] DEBUG GMS                      - NODE_2: I will be the merge leader. Starting the merge task. Views: {NODE_1 (flags=1)=[NODE_1 (flags=1)|3] (2) [NODE_1 (flags=1), NODE_2], NODE_2=MergeView::[NODE_2|24] (1) [NODE_2], 1 subgroups: [NODE_2|23] (1) [NODE_2]} | (Log4J2LogImpl.java:71) 
2019-01-17 10:11:37.300 [MergeTas |                  ] DEBUG STABLE                   - suspending message garbage collection | (Log4J2LogImpl.java:66) 
2019-01-17 10:11:37.300 [MergeTas |                  ] DEBUG STABLE                   - NODE_2: resume task started, max_suspend_time=220000 | (Log4J2LogImpl.java:71) 
2019-01-17 10:11:37.300 [MergeTas |                  ] DEBUG GMS                      - NODE_2: merge task NODE_2::22 started with 2 participants | (Log4J2LogImpl.java:71) 
2019-01-17 10:11:37.300 [MergeTas |                  ] TRACE GMS                      - NODE_2: sending MERGE_REQ to [NODE_1 (flags=1), NODE_2] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.300 [MergeTas |                  ] TRACE UNICAST3                 - NODE_2 --> DATA(NODE_1 (flags=1): #32, conn_id=0) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:37.300 [MergeTas |                  ] TRACE TCP                      - NODE_2: sending msg to NODE_1 (flags=1), src=NODE_2, headers are GMS: GmsHeader[MERGE_REQ]: merge_id=NODE_2::22, UNICAST3: DATA, seqno=32, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.301 [MergeTas |                  ] TRACE UNICAST3                 - NODE_2 --> DATA(NODE_2: #64, conn_id=1) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:37.301 [MergeTas |                  ] TRACE TCP                      - NODE_2: sending msg to NODE_2, src=NODE_2, headers are GMS: GmsHeader[MERGE_REQ]: merge_id=NODE_2::22, UNICAST3: DATA, seqno=64, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.301 [MergeTas |                  ] TRACE TCP                      - NODE_2: looping back message [dst: NODE_2, src: NODE_2 (3 headers), size=22 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[MERGE_REQ]: merge_id=NODE_2::22, UNICAST3: DATA, seqno=64, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.301 [jgrp-174 |                  ] TRACE TCP                      - NODE_2: received [dst: NODE_2, src: NODE_2 (3 headers), size=22 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[MERGE_REQ]: merge_id=NODE_2::22, UNICAST3: DATA, seqno=64, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.302 [jgrp-174 |                  ] TRACE UNICAST3                 - NODE_2 <-- DATA(NODE_2: #64, conn_id=1) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.302 [jgrp-174 |                  ] TRACE UNICAST3                 - NODE_2: delivering NODE_2#64 | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.302 [jgrp-174 |                  ] TRACE GMS                      - NODE_2: got merge request from NODE_2, merge_id=NODE_2::22, mbrs=[NODE_2] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.303 [jgrp-174 |                  ] TRACE UNICAST3                 - NODE_2 --> DATA(NODE_2: #65, conn_id=1) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:37.303 [jgrp-174 |                  ] TRACE TCP                      - NODE_2: sending msg to NODE_2, src=NODE_2, headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::22, UNICAST3: DATA, seqno=65, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.303 [jgrp-174 |                  ] TRACE TCP                      - NODE_2: looping back message [dst: NODE_2, src: NODE_2 (3 headers), size=51 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::22, UNICAST3: DATA, seqno=65, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.304 [jgrp-179 |                  ] TRACE UNICAST3                 - NODE_2: delivering #65 - #65 (1 messages) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:37.304 [jgrp-179 |                  ] TRACE GMS                      - NODE_2: got merge response from NODE_2, merge_id=NODE_2::22, merge data is sender=NODE_2, view=[NODE_2|24] (1) [NODE_2], digest=NODE_2: [21 (21)] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.304 [jgrp-177 |                  ] TRACE TCP                      - NODE_2: received [dst: NODE_2, src: NODE_2 (3 headers), size=51 bytes, flags=OOB|INTERNAL, transient_flags=OOB_DELIVERED], headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::22, UNICAST3: DATA, seqno=65, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.304 [jgrp-177 |                  ] TRACE UNICAST3                 - NODE_2 <-- DATA(NODE_2: #65, conn_id=1) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.305 [jgrp-177 |                  ] TRACE TCP                      - NODE_2: received [dst: NODE_2, src: NODE_1 (flags=1) (3 headers), size=53 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::22, UNICAST3: DATA, seqno=19751, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.305 [jgrp-177 |                  ] TRACE UNICAST3                 - NODE_2 <-- DATA(NODE_1 (flags=1): #19751, conn_id=0) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.305 [jgrp-177 |                  ] TRACE UNICAST3                 - NODE_2: delivering NODE_1 (flags=1)#19751 | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.306 [jgrp-177 |                  ] TRACE GMS                      - NODE_2: got merge response from NODE_1 (flags=1), merge_id=NODE_2::22, merge data is sender=NODE_1 (flags=1), view=[NODE_1 (flags=1)|3] (1) [NODE_1 (flags=1)], digest=NODE_1 (flags=1): [3 (3)] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.306 [MergeTas |                  ] TRACE GMS                      - NODE_2: collected 2 merge response(s) in 6 ms | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.307 [MergeTas |                  ] TRACE GMS                      - NODE_2: consolidated view=MergeView::[NODE_1 (flags=1)|25] (2) [NODE_1 (flags=1), NODE_2], 2 subgroups: [NODE_1 (flags=1)|3] (2) [NODE_1 (flags=1), NODE_2], MergeView::[NODE_2|24] (1) [NODE_2], 1 subgroups: [NODE_2|23] (1) [NODE_2]
consolidated digest=NODE_1 (flags=1): [3 (3)], NODE_2: [21 (21)] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.307 [MergeTas |                  ] DEBUG GMS                      - NODE_2: installing merge view [NODE_1 (flags=1)|25] (2 members) in 2 coords | (Log4J2LogImpl.java:71) 
2019-01-17 10:11:37.307 [MergeTas |                  ] TRACE UNICAST3                 - NODE_2 --> DATA(NODE_1 (flags=1): #33, conn_id=0) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:37.307 [MergeTas |                  ] TRACE TCP                      - NODE_2: sending msg to NODE_1 (flags=1), src=NODE_2, headers are GMS: GmsHeader[INSTALL_MERGE_VIEW], UNICAST3: DATA, seqno=33, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.307 [MergeTas |                  ] TRACE UNICAST3                 - NODE_2 --> DATA(NODE_2: #66, conn_id=1) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:37.307 [MergeTas |                  ] TRACE TCP                      - NODE_2: sending msg to NODE_2, src=NODE_2, headers are GMS: GmsHeader[INSTALL_MERGE_VIEW], UNICAST3: DATA, seqno=66, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.308 [MergeTas |                  ] TRACE TCP                      - NODE_2: looping back message [dst: NODE_2, src: NODE_2 (3 headers), size=108 bytes], headers are GMS: GmsHeader[INSTALL_MERGE_VIEW], UNICAST3: DATA, seqno=66, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.308 [MergeTas |                  ] DEBUG GMS                      - NODE_2: merge NODE_2::22 took 9 ms | (Log4J2LogImpl.java:71) 
2019-01-17 10:11:37.309 [jgrp-179 |                  ] TRACE TCP                      - NODE_2: received message batch of 1 messages from NODE_2 | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.309 [jgrp-179 |                  ] TRACE UNICAST3                 - NODE_2 <-- DATA(NODE_2: #66 - #66) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.309 [jgrp-179 |                  ] TRACE UNICAST3                 - NODE_2: delivering #66 - #66 (1 messages) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:37.310 [jgrp-179 |                  ] TRACE GMS                      - NODE_2: mcasting view MergeView::[NODE_1 (flags=1)|25] (2) [NODE_1 (flags=1), NODE_2], 2 subgroups: [NODE_1 (flags=1)|3] (2) [NODE_1 (flags=1), NODE_2], [NODE_2|24] (1) [NODE_2] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.310 [jgrp-179 |                  ] DEBUG NAKACK2                  - 
[NODE_2 mergeDigest()]
existing digest:  NODE_2: [21 (21)]
new digest:       NODE_1 (flags=1): [3 (3)], NODE_2: [21 (21)]
resulting digest: NODE_1 (flags=1): [3 (3)], NODE_2: [21 (21)] | (Log4J2LogImpl.java:66) 
2019-01-17 10:11:37.310 [jgrp-179 |                  ] DEBUG GMS                      - NODE_2: installing view MergeView::[NODE_1 (flags=1)|25] (2) [NODE_1 (flags=1), NODE_2], 2 subgroups: [NODE_1 (flags=1)|3] (2) [NODE_1 (flags=1), NODE_2], [NODE_2|24] (1) [NODE_2] | (Log4J2LogImpl.java:71) 

在这之后,一切都恢复正常,直到一个节点重新启动或其他什么。我在这里真的很无知,需要一些帮助。有人知道为什么NODE_1没有早点开始加入/合并吗?谢谢你的帮助。

这是我目前的配置:

public class JGroupsConfiguration {
final Protocol[] PROTOCOL_STACK = {
new TCP()
.setValue("bind_port", EngineConfiguration.get().clusterSrcPort().get())
.setValue("max_bundle_size", 64000)
.setValue("sock_conn_timeout", 300)
.setValue("thread_pool_min_threads", 0)
.setValue("thread_pool_max_threads", 20)
.setValue("thread_pool_keep_alive_time", 3000),
new TCPPING()
.setValue("async_discovery", false)
.setValue("port_range", 0)
.setValue("send_cache_on_join", true)
.setValue("return_entire_cache", true),
new MERGE3()
.setValue("min_interval", 10000)
.setValue("max_interval", 30000),
new FD()
.setValue("timeout", 3000)
.setValue("max_tries", 3),
new VERIFY_SUSPECT()
.setValue("timeout", 1500),
new BARRIER(),
new NAKACK2()
.setValue("use_mcast_xmit", false)
.setValue("discard_delivered_msgs", true),
new UNICAST3()
// Disables retransmitting of queued messages to reconnected nodes.
.setValue("max_retransmit_time", 0),
new STABLE()
.setValue("desired_avg_gossip", 50000)
.setValue("max_bytes", 4000000),
new GMS()
.setValue("print_local_addr", true)
.setValue("join_timeout", 2000),
new FRAG3()
.setValue("frag_size", 60000),
new STATE()
};
}

问题出在UNICAST3协议上。该协议试图在允许将每个消息作为成员合并到集群中之前,将其重新发送给重新连接的集群成员。这就是节点之间所有SEGNO消息的原因。在我们从堆栈中删除协议后,重新连接的成员的合并工作得很好注意:我们的应用程序不需要UNICAST3的功能,因为我们不依赖可靠的消息传递!

最新更新