我有一个由2个节点组成的galera集群(mariadb-1.test.com:10.10.10.21,mariadb-2.test.com:10.10.22(和3个不同的centos 8服务器上的仲裁器
是否可以在不删除/var/lib/mysql和/或使用galera_new_cluster的情况下重新启动节点
是否可以以不同于完全群集崩溃后的方式重新启动节点?
[centos@mariadb-1 ~]$ sudo systemctl restart mariadb.service
Job for mariadb.service failed because a fatal signal was delivered to the control process.
See "systemctl status mariadb.service" and "journalctl -xe" for details.
[centos@mariadb-1 ~]$ sudo systemctl status mariadb.service
● mariadb.service - MariaDB 10.3 database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2021-12-16 00:07:08 CET; 29s ago
Docs: man:mysqld(8)
https://mariadb.com/kb/en/library/systemd/
Process: 9887 ExecStart=/usr/libexec/mysqld --basedir=/usr $MYSQLD_OPTS $_WSREP_NEW_CLUSTER (code=exited, status=1/FAILURE)
Process: 9849 ExecStartPre=/usr/libexec/mysql-prepare-db-dir mariadb.service (code=exited, status=0/SUCCESS)
Process: 9824 ExecStartPre=/usr/libexec/mysql-check-socket (code=exited, status=0/SUCCESS)
Main PID: 9887 (code=exited, status=1/FAILURE)
Status: "MariaDB server is down"
déc. 16 00:06:36 mariadb-1.test.com systemd[1]: Starting MariaDB 10.3 database server...
déc. 16 00:06:36 mariadb-1.test.com mysql-prepare-db-dir[9849]: Database MariaDB is probably initialized in /var/lib/mysql already, nothing is done.
déc. 16 00:06:36 mariadb-1.test.com mysql-prepare-db-dir[9849]: If this is not the case, make sure the /var/lib/mysql is empty before running mysql-prepare-db-dir.
déc. 16 00:06:36 mariadb-1.test.com mysqld[9887]: 2021-12-16 0:06:36 0 [Note] /usr/libexec/mysqld (mysqld 10.3.28-MariaDB) starting as process 9887 ...
déc. 16 00:07:08 mariadb-1.test.com systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE
déc. 16 00:07:08 mariadb-1.test.com systemd[1]: mariadb.service: Failed with result 'exit-code'.
déc. 16 00:07:08 mariadb-1.test.com systemd[1]: Failed to start MariaDB 10.3 database server.
journalctl-xe没有提供更多细节
/var/log/mariadb/mariadb.log重新启动失败后节点上的内容:
2021-12-16 0:14:43 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2021-12-16 0:14:43 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
2021-12-16 0:14:43 0 [Note] WSREP: wsrep_load(): Galera 3.32(rXXXX) by Codership Oy <info@codership.com> loaded successfully.
2021-12-16 0:14:43 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2021-12-16 0:14:43 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 0
2021-12-16 0:14:43 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.10.10.21; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S;
2021-12-16 0:14:43 0 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 00000000-0000-0000-0000-000000000000:-1
2021-12-16 0:14:43 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2021-12-16 0:14:43 0 [Note] WSREP: wsrep_sst_grab()
2021-12-16 0:14:43 0 [Note] WSREP: Start replication
2021-12-16 0:14:43 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2021-12-16 0:14:43 0 [Note] WSREP: protonet asio version 0
2021-12-16 0:14:43 0 [Note] WSREP: Using CRC-32C for message checksums.
2021-12-16 0:14:43 0 [Note] WSREP: backend: asio
2021-12-16 0:14:43 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2021-12-16 0:14:43 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2021-12-16 0:14:43 0 [Note] WSREP: restore pc from disk failed
2021-12-16 0:14:43 0 [Note] WSREP: GMCast version 0
2021-12-16 0:14:43 0 [Note] WSREP: (c96725c5, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2021-12-16 0:14:43 0 [Note] WSREP: (c96725c5, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2021-12-16 0:14:43 0 [Note] WSREP: EVS version 0
2021-12-16 0:14:43 0 [Note] WSREP: gcomm: connecting to group 'test-wsrep', peer '10.10.10.21:,10.10.10.22:'
2021-12-16 0:14:43 0 [Note] WSREP: (c96725c5, 'tcp://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address tcp://10.10.10.21:4567
2021-12-16 0:14:43 0 [Note] WSREP: (c96725c5, 'tcp://0.0.0.0:4567') connection established to 3cfd9c42 tcp://10.10.10.22:4567
2021-12-16 0:14:43 0 [Note] WSREP: (c96725c5, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.10.10.30:4567
2021-12-16 0:14:43 0 [Note] WSREP: (c96725c5, 'tcp://0.0.0.0:4567') connection established to 34bdff1b tcp://10.10.10.30:4567
2021-12-16 0:14:43 0 [Note] WSREP: declaring 34bdff1b at tcp://10.10.10.30:4567 stable
2021-12-16 0:14:43 0 [Note] WSREP: declaring 3cfd9c42 at tcp://10.10.10.22:4567 stable
2021-12-16 0:14:43 0 [Note] WSREP: Node 34bdff1b state prim
2021-12-16 0:14:43 0 [Note] WSREP: view(view_id(PRIM,34bdff1b,340) memb {
34bdff1b,0
3cfd9c42,0
c96725c5,0
} joined {
} left {
} partitioned {
})
2021-12-16 0:14:43 0 [Note] WSREP: save pc into disk
2021-12-16 0:14:44 0 [Note] WSREP: gcomm: connected
2021-12-16 0:14:44 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2021-12-16 0:14:44 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2021-12-16 0:14:44 0 [Note] WSREP: Opened channel 'test-wsrep'
2021-12-16 0:14:44 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
2021-12-16 0:14:44 0 [Note] WSREP: Waiting for SST to complete.
2021-12-16 0:14:44 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2021-12-16 0:14:44 0 [Note] WSREP: STATE EXCHANGE: sent state msg: c9b97488-5dfc-11ec-a809-f37d727cc3ec
2021-12-16 0:14:44 0 [Note] WSREP: STATE EXCHANGE: got state msg: c9b97488-5dfc-11ec-a809-f37d727cc3ec from 0 (garb)
2021-12-16 0:14:44 0 [Note] WSREP: STATE EXCHANGE: got state msg: c9b97488-5dfc-11ec-a809-f37d727cc3ec from 1 (mariadb-2.test.com)
2021-12-16 0:14:44 0 [Note] WSREP: STATE EXCHANGE: got state msg: c9b97488-5dfc-11ec-a809-f37d727cc3ec from 2 (mariadb-1.test.com)
2021-12-16 0:14:44 0 [Note] WSREP: Quorum results:
version = 4,
component = PRIMARY,
conf_id = 289,
members = 2/3 (joined/total),
act_id = 66241951,
last_appl. = -1,
protocols = 0/9/3 (gcs/repl/appl),
group UUID = 52e841e5-398a-11eb-8a1c-8efbc5e6d73d
2021-12-16 0:14:44 0 [Note] WSREP: Flow-control interval: [28, 28]
2021-12-16 0:14:44 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 66241951)
2021-12-16 0:14:44 2 [Note] WSREP: State transfer required:
Group state: 52e841e5-398a-11eb-8a1c-8efbc5e6d73d:66241951
Local state: 00000000-0000-0000-0000-000000000000:-1
2021-12-16 0:14:44 2 [Note] WSREP: REPL Protocols: 9 (4, 2)
2021-12-16 0:14:44 2 [Note] WSREP: New cluster view: global state: 52e841e5-398a-11eb-8a1c-8efbc5e6d73d:66241951, view# 290: Primary, number of nodes: 3, my index: 2, protocol version 3
2021-12-16 0:14:44 2 [Warning] WSREP: Gap in state sequence. Need state transfer.
2021-12-16 0:14:44 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.10.10.21' --datadir '/var/lib/mysql/' --parent '10502' --mysqld-args --basedir=/usr'
2021-12-16 0:14:44 2 [Note] WSREP: Prepared SST request: rsync|10.10.10.21:4444/rsync_sst
2021-12-16 0:14:44 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-12-16 0:14:44 2 [Note] WSREP: Assign initial position for certification: 66241951, protocol version: 4
2021-12-16 0:14:44 0 [Note] WSREP: Service thread queue flushed.
2021-12-16 0:14:44 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (52e841e5-398a-11eb-8a1c-8efbc5e6d73d): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():467. IST will be unavailable.
2021-12-16 0:14:44 0 [Note] WSREP: Member 2.0 (mariadb-1.test.com) requested state transfer from '*any*'. Selected 1.0 (mariadb-2.test.com)(SYNCED) as donor.
2021-12-16 0:14:44 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 66241951)
2021-12-16 0:14:44 2 [Note] WSREP: Requesting state transfer: success, donor: 1
2021-12-16 0:14:44 2 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 52e841e5-398a-11eb-8a1c-8efbc5e6d73d:66241951
2021-12-16 0:14:44 0 [Warning] WSREP: 1.0 (mariadb-2.test.com): State transfer to 2.0 (mariadb-1.test.com) failed: -255 (Unknown error 255)
2021-12-16 0:14:44 0 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():780: Will never receive state. Need to abort.
2021-12-16 0:14:44 0 [Note] WSREP: gcomm: terminating thread
2021-12-16 0:14:44 0 [Note] WSREP: gcomm: joining thread
2021-12-16 0:14:44 0 [Note] WSREP: gcomm: closing backend
2021-12-16 0:14:45 0 [Note] WSREP: view(view_id(NON_PRIM,34bdff1b,340) memb {
c96725c5,0
} joined {
} left {
} partitioned {
34bdff1b,0
3cfd9c42,0
})
2021-12-16 0:14:45 0 [Note] WSREP: view((empty))
2021-12-16 0:14:45 0 [Note] WSREP: gcomm: closed
2021-12-16 0:14:45 0 [Note] WSREP: /usr/libexec/mysqld: Terminated.
WSREP_SST: [ERROR] Parent mysqld process (PID:10502) terminated unexpectedly. (20211216 00:14:46.490)
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 10560 (20211216 00:14:46.491)
WSREP_SST: [INFO] Joiner cleanup done. (20211216 00:14:46.996)
[centos@mariadb-1 ~]$
留在集群中的节点似乎很好:
MariaDB [(none)]> SHOW GLOBAL STATUS LIKE 'wsrep_%' ;
+-------------------------------+--------------------------------------+
| Variable_name | Value |
+-------------------------------+--------------------------------------+
| wsrep_applier_thread_count | 1 |
| wsrep_apply_oooe | 0.000220 |
| wsrep_apply_oool | 0.000000 |
| wsrep_apply_window | 1.000220 |
| wsrep_causal_reads | 0 |
| wsrep_cert_deps_distance | 3770.895462 |
| wsrep_cert_index_size | 69 |
| wsrep_cert_interval | 0.091845 |
| wsrep_cluster_conf_id | 268 |
| wsrep_cluster_size | 2 |
| wsrep_cluster_state_uuid | 52e841e5-398a-11eb-8a1c-8efbc5e6d73d |
| wsrep_cluster_status | Primary |
| wsrep_cluster_weight | 2 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 1.000000 |
| wsrep_connected | ON |
| wsrep_desync_count | 0 |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0/0/0/0/0 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_flow_control_active | false |
| wsrep_flow_control_paused | 0.000000 |
| wsrep_flow_control_paused_ns | 3297992241 |
| wsrep_flow_control_recv | 9 |
| wsrep_flow_control_requested | false |
| wsrep_flow_control_sent | 9 |
| wsrep_gcomm_uuid | 3cfd9c42-e33a-11eb-aea6-1fc34d7a6b30 |
| wsrep_gmcast_segment | 0 |
| wsrep_incoming_addresses | ,10.10.10.22:3306 |
| wsrep_last_committed | 66241750 |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_cached_downto | 66130787 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_commits | 116324 |
| wsrep_local_index | 1 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_avg | 0.006762 |
| wsrep_local_recv_queue_max | 31 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_avg | 0.000003 |
| wsrep_local_send_queue_max | 2 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_local_state_uuid | 52e841e5-398a-11eb-8a1c-8efbc5e6d73d |
| wsrep_open_connections | 0 |
| wsrep_open_transactions | 0 |
| wsrep_protocol_version | 9 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 3.32(rXXXX) |
| wsrep_ready | ON |
| wsrep_received | 26471367 |
| wsrep_received_bytes | 30177334116 |
| wsrep_repl_data_bytes | 91246197 |
| wsrep_repl_keys | 9616150 |
| wsrep_repl_keys_bytes | 79743208 |
| wsrep_repl_other_bytes | 0 |
| wsrep_replicated | 117234 |
| wsrep_replicated_bytes | 2538681128 |
| wsrep_rollbacker_thread_count | 1 |
| wsrep_thread_count | 2 |
+-------------------------------+--------------------------------------+
谢谢。
问题解决了:状态快照传输方法是rsync,我在receiver节点上的/var/lib/mysql中有一个目录,mysql没有写权限。我删除了该目录,节点成功重启。