Pacemaker无法在postgres-11上启动从属节点



我在主从式热备用设置中有两个节点(分别命名为node03和node04(,使用起搏器管理集群。在切换之前,node04是主节点,03是备用节点。切换后,我一直试图再次将node04作为从属节点,但无法做到。

在切换过程中,我意识到有人更改了配置文件,并将ignore_system_indexes参数设置为true。我不得不删除它并手动重新启动postgres服务器。正是在这之后,星团开始出现奇怪的行为。

node04可以手动恢复为从属节点,也就是说,如果我手动启动postgres实例并使用recovery.conf文件。

以下是了解情况所需的文件:

sudo crm_mon -A1f
Stack: corosync
Current DC: node03 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Node node04: standby
Online: [ node03 ]
Active resources:
Resource Group: master-group
vip-repli  (ocf::heartbeat:IPaddr2):       Started node03
vip-master (ocf::heartbeat:IPaddr2):       Started node03
Master/Slave Set: pgsql-cluster [pgsqlins]
Masters: [ node03 ]
Node Attributes:
* Node node03:
+ master-pgsqlins                   : 1000
+ pgsqlins-data-status              : LATEST
+ pgsqlins-master-baseline          : 00008820DC000098
+ pgsqlins-status                   : PRI
* Node node04:
+ master-pgsqlins                   : -INFINITY
+ pgsqlins-data-status              : DISCONNECT
+ pgsqlins-status                   : STOP
Migration Summary:
* Node node03:
* Node node04:

recovery.conf

primary_conninfo = 'host=1xx.xx.xx.xx port=5432 user=replica application_name=node04 keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
restore_command = 'rsync -a /Dxxxxx1/wal_archive/%f %p'
recovery_target_timeline = 'latest'
standby_mode = 'on'

集群cib

sudo pcs cluster cib
<cib crm_feature_set="3.0.14" validate-with="pacemaker-2.10" epoch="269" num_updates="4" admin_epoch="0" cib-last-written="Mon Jun 28 15:13:35 2021" update-origin="node04" update-client="crmd" update-user="hacluster" have-quorum="1" dc-uuid="1">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
<nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="false"/>
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.23-1.el7_9.1-9acf116022"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
<nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="pgcluster"/>
<nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1624860815"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="1" uname="node03">
<instance_attributes id="nodes-1">
<nvpair id="nodes-1-pgsqlins-data-status" name="pgsqlins-data-status" value="LATEST"/>
</instance_attributes>
</node>
<node id="2" uname="node04">
<instance_attributes id="nodes-2">
<nvpair id="nodes-2-pgsqlins-data-status" name="pgsqlins-data-status" value="DISCONNECT"/>
<nvpair id="nodes-2-standby" name="standby" value="on"/>
</instance_attributes>
</node>
</nodes>
<resources>
<group id="master-group">
<primitive class="ocf" id="vip-repli" provider="heartbeat" type="IPaddr2">
<instance_attributes id="vip-repli-instance_attributes">
<nvpair id="vip-repli-instance_attributes-cidr_netmask" name="cidr_netmask" value="24"/>
<nvpair id="vip-repli-instance_attributes-ip" name="ip" value="1xx.xx.xx.xx"/>
<nvpair id="vip-repli-instance_attributes-nic" name="nic" value="eth2"/>
</instance_attributes>
<operations>
<op id="vip-repli-monitor-interval-10s" interval="10s" name="monitor" timeout="20s"/>
<op id="vip-repli-start-interval-0s" interval="0s" name="start" timeout="20s"/>
<op id="vip-repli-stop-interval-0s" interval="0s" name="stop" timeout="20s"/>
</operations>
</primitive>
<primitive class="ocf" id="vip-master" provider="heartbeat" type="IPaddr2">
<instance_attributes id="vip-master-instance_attributes">
<nvpair id="vip-master-instance_attributes-cidr_netmask" name="cidr_netmask" value="24"/>
<nvpair id="vip-master-instance_attributes-ip" name="ip" value="1x.xx.xxx.xxx"/>
<nvpair id="vip-master-instance_attributes-nic" name="nic" value="eth1"/>
</instance_attributes>
<operations>
<op id="vip-master-monitor-interval-10s" interval="10s" name="monitor" timeout="20s"/>
<op id="vip-master-start-interval-0s" interval="0s" name="start" timeout="20s"/>
<op id="vip-master-stop-interval-0s" interval="0s" name="stop" timeout="20s"/>
</operations>
</primitive>
</group>
<master id="pgsql-cluster">
<primitive class="ocf" id="pgsqlins" provider="heartbeat" type="pgsql11">
<instance_attributes id="pgsqlins-instance_attributes">
<nvpair id="pgsqlins-instance_attributes-master_ip" name="master_ip" value="1xx.xx.xx.xx"/>
<nvpair id="pgsqlins-instance_attributes-node_list" name="node_list" value="node03 node04"/>
<nvpair id="pgsqlins-instance_attributes-pgctl" name="pgctl" value="/usr/pgsql-11/bin/pg_ctl"/>
<nvpair id="pgsqlins-instance_attributes-pgdata" name="pgdata" value="/DPxxxx01/datadg/data"/>
<nvpair id="pgsqlins-instance_attributes-pgport" name="pgport" value="5432"/>
<nvpair id="pgsqlins-instance_attributes-primary_conninfo_opt" name="primary_conninfo_opt" value="keepalives_idle=60 keepalives_interval=5 keepalives_count=5"/>
<nvpair id="pgsqlins-instance_attributes-psql" name="psql" value="/usr/pgsql-11/bin/psql"/>
<nvpair id="pgsqlins-instance_attributes-rep_mode" name="rep_mode" value="sync"/>
<nvpair id="pgsqlins-instance_attributes-repuser" name="repuser" value="replica"/>
<nvpair id="pgsqlins-instance_attributes-restart_on_promote" name="restart_on_promote" value="true"/>
<nvpair id="pgsqlins-instance_attributes-restore_command" name="restore_command" value="rsync -a /Dxxxxx01/wal_archive/%f %p"/>
</instance_attributes>
<operations>
<op id="pgsqlins-demote-interval-0" interval="0" name="demote" on-fail="stop" timeout="60s"/>
<op id="pgsqlins-methods-interval-0s" interval="0s" name="methods" timeout="5s"/>
<op id="pgsqlins-monitor-interval-10s" interval="10s" name="monitor" on-fail="restart" timeout="60s"/>
<op id="pgsqlins-monitor-interval-9s" interval="9s" name="monitor" on-fail="restart" role="Master" timeout="60s"/>
<op id="pgsqlins-notify-interval-0" interval="0" name="notify" timeout="60s"/>
<op id="pgsqlins-promote-interval-0" interval="0" name="promote" on-fail="restart" timeout="60s"/>
<op id="pgsqlins-start-interval-0" interval="0" name="start" on-fail="restart" timeout="60s"/>
<op id="pgsqlins-stop-interval-0" interval="0" name="stop" on-fail="block" timeout="60s"/>
</operations>
</primitive>
<meta_attributes id="pgsql-cluster-meta_attributes">
<nvpair id="pgsql-cluster-meta_attributes-master-node-max" name="master-node-max" value="1"/>
<nvpair id="pgsql-cluster-meta_attributes-clone-max" name="clone-max" value="2"/>
<nvpair id="pgsql-cluster-meta_attributes-notify" name="notify" value="true"/>
<nvpair id="pgsql-cluster-meta_attributes-master-max" name="master-max" value="1"/>
<nvpair id="pgsql-cluster-meta_attributes-clone-node-max" name="clone-node-max" value="1"/>
</meta_attributes>
</master>
</resources>
<constraints>
<rsc_colocation id="colocation-master-group-pgsql-cluster-INFINITY" rsc="master-group" score="INFINITY" with-rsc="pgsql-cluster" with-rsc-role="Master"/>
<rsc_order first="pgsql-cluster" first-action="promote" id="order-pgsql-cluster-master-group-INFINITY" score="INFINITY" symmetrical="false" then="master-group" then-action="start"/>
<rsc_order first="pgsql-cluster" first-action="demote" id="order-pgsql-cluster-master-group-0" score="0" symmetrical="false" then="master-group" then-action="stop"/>
<rsc_location id="cli-prefer-pgsql-cluster" rsc="pgsql-cluster" role="Started" node="node04" score="INFINITY"/>
</constraints>
</configuration>
<status>
<node_state id="1" uname="node03" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
<transient_attributes id="1">
<instance_attributes id="status-1">
<nvpair id="status-1-pgsqlins-status" name="pgsqlins-status" value="PRI"/>
<nvpair id="status-1-master-pgsqlins" name="master-pgsqlins" value="1000"/>
<nvpair id="status-1-pgsqlins-master-baseline" name="pgsqlins-master-baseline" value="00008820DC000098"/>
</instance_attributes>
</transient_attributes>
<lrm id="1">
<lrm_resources>
<lrm_resource id="vip-master" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="vip-master_last_0" operation_key="vip-master_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="3:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:0;3:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node03" call-id="535" rc-code="0" op-status="0" interval="0" last-run="1624859077" last-rc-change="1624859077" exec-time="90" queue-time="0" op-digest="38fc1b2633211138e53cb349a5c147ff"/>
<lrm_rsc_op id="vip-master_monitor_10000" operation_key="vip-master_monitor_10000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="4:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:0;4:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node03" call-id="536" rc-code="0" op-status="0" interval="10000" last-rc-change="1624859077" exec-time="72" queue-time="0" op-digest="4cbf56ab9e52c6f07a7be8cbb786451c"/>
</lrm_resource>
<lrm_resource id="vip-repli" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="vip-repli_last_0" operation_key="vip-repli_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="1:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:0;1:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node03" call-id="532" rc-code="0" op-status="0" interval="0" last-run="1624859077" last-rc-change="1624859077" exec-time="127" queue-time="0" op-digest="dd04ed3322c75b7bab13c5bea56dbe77"/>
<lrm_rsc_op id="vip-repli_monitor_10000" operation_key="vip-repli_monitor_10000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="2:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:0;2:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node03" call-id="534" rc-code="0" op-status="0" interval="10000" last-rc-change="1624859077" exec-time="55" queue-time="0" op-digest="c76770c29a91fb082fdf1fdd8b0469c3"/>
</lrm_resource>
<lrm_resource id="pgsqlins" type="pgsql11" class="ocf" provider="heartbeat">
<lrm_rsc_op id="pgsqlins_last_0" operation_key="pgsqlins_promote_0" operation="promote" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="12:432:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:0;12:432:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node03" call-id="530" rc-code="0" op-status="0" interval="0" last-run="1624859073" last-rc-change="1624859073" exec-time="3307" queue-time="0" op-digest="2f51441ed087061eb68745fd8157ddb6"/>
<lrm_rsc_op id="pgsqlins_monitor_9000" operation_key="pgsqlins_monitor_9000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="13:433:8:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:8;13:433:8:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node03" call-id="533" rc-code="8" op-status="0" interval="9000" last-rc-change="1624859078" exec-time="497" queue-time="1" op-digest="978aa48a7da35944c793e174dbee9a1d"/>
</lrm_resource>
</lrm_resources>
</lrm>
</node_state>
<node_state id="2" uname="node04" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
<lrm id="2">
<lrm_resources>
<lrm_resource id="vip-repli" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="vip-repli_last_0" operation_key="vip-repli_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="4:1:7:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:7;4:1:7:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node04" call-id="5" rc-code="7" op-status="0" interval="0" last-run="1624600624" last-rc-change="1624600624" exec-time="65" queue-time="0" op-digest="dd04ed3322c75b7bab13c5bea56dbe77"/>
</lrm_resource>
<lrm_resource id="vip-master" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="vip-master_last_0" operation_key="vip-master_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="5:1:7:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:7;5:1:7:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node04" call-id="9" rc-code="7" op-status="0" interval="0" last-run="1624600624" last-rc-change="1624600624" exec-time="62" queue-time="0" op-digest="38fc1b2633211138e53cb349a5c147ff"/>
</lrm_resource>
<lrm_resource id="pgsqlins" type="pgsql11" class="ocf" provider="heartbeat">
<lrm_rsc_op id="pgsqlins_last_0" operation_key="pgsqlins_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="4:436:7:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:7;4:436:7:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node04" call-id="192" rc-code="7" op-status="0" interval="0" last-run="1624860816" last-rc-change="1624860816" exec-time="178" queue-time="0" op-digest="2f51441ed087061eb68745fd8157ddb6"/>
</lrm_resource>
</lrm_resources>
</lrm>
<transient_attributes id="2">
<instance_attributes id="status-2">
<nvpair id="status-2-pgsqlins-status" name="pgsqlins-status" value="STOP"/>
<nvpair id="status-2-master-pgsqlins" name="master-pgsqlins" value="-INFINITY"/>
</instance_attributes>
</transient_attributes>
</node_state>
</status>
</cib>

如果我尝试按node04取消绑定,它首先降级node03,然后尝试提升node04,尽管node04没有出现。我试着只单独提到node04,但也失败了。但是,如果我尝试从上面的情况手动启动node04,我可以做到。如果我尝试清理pgsqlins资源,它会失败。

这是corosync.log

8 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_process_request:  Forwarding cib_apply_diff operation for section 'all' to all (origin=local/ci
badmin/2)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       Diff: --- 0.251.32 2
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       Diff: +++ 0.252.0 b956759712580c1bfdffd25cbf4ab8e9
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       -- /cib/configuration/nodes/node[@id='2']/instance_attributes[@id='nodes-2']/
nvpair[@id='nodes-2-standby']
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       +  /cib:  @epoch=252, @num_updates=0
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_process_request:  Completed cib_apply_diff operation for section 'all': OK (rc=0, origin=dci2pg
s04/cibadmin/2, version=0.252.0)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_file_backup:      Archived previous version as /var/lib/pacemaker/cib/cib-60.raw
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_file_write_with_digest:   Wrote version 0.252.0 of the CIB to disk (digest: 8b99629d323c923de59
2700bc4398c49)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_file_write_with_digest:   Reading cluster configuration file /var/lib/pacemaker/cib/cib.ZtvQXP
(digest: /var/lib/pacemaker/cib/cib.fh4Toy)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       Diff: --- 0.252.0 2
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       Diff: +++ 0.252.1 (null)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       +  /cib:  @num_updates=1
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       +  /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@i
d='pgsqlins']/lrm_rsc_op[@id='pgsqlins_last_0']:  @operation_key=pgsqlins_demote_0, @operation=demote, @transition-key=10:396:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @transi
tion-magic=-1:193;10:396:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @call-id=-1, @rc-code=193, @op-status=-1, @last-run=1624852894, @last-rc-change=1624852894, @exec-time=0
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_process_request:  Completed cib_modify operation for section status: OK (rc=0, origin=node03
/crmd/948, version=0.252.1)
Jun 28 13:01:34 [9294] node04.dc.japannext.co.jp      attrd:     info: attrd_peer_update:    Setting master-pgsqlins[node03]: 1000 -> -INFINITY from node03
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       Diff: --- 0.252.1 2
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       Diff: +++ 0.252.2 (null)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       +  /cib:  @num_updates=2
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       +  /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_att
ributes[@id='status-1']/nvpair[@id='status-1-master-pgsqlins']:  @value=-INFINITY
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_process_request:  Completed cib_modify operation for section status: OK (rc=0, origin=node03
/attrd/211, version=0.252.2)
Jun 28 13:01:34 [9294] node04.dc.japannext.co.jp      attrd:     info: attrd_peer_update:    Setting pgsqlins-master-baseline[node03]: 00008820CC000098 -> (null) from node03
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       Diff: --- 0.252.2 2
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       Diff: +++ 0.252.3 (null)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       -- /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-pgsqlins-master-baseline']
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       +  /cib:  @num_updates=3
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp        cib:     info: cib_process_request:  Completed cib_modify operation for section status: OK (rc=0, origin=node03/attrd/212, version=0.252.3)
Jun 28 13:01:35 [9294] node04.dc.japannext.co.jp      attrd:     info: attrd_peer_update:    Setting pgsqlins-status[node03]: PRI -> STOP from node03
.
.
.
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       +  /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='pgsqlins']/lrm_rsc_op[@id='pgsqlins_last_0']:  @transition-magic=0:0;9:397:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @call-id=445, @rc-code=0, @op-status=0, @exec-time=471
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp        cib:     info: cib_process_request:  Completed cib_modify operation for section status: OK (rc=0, origin=node03/crmd/956, version=0.252.11)
Jun 28 13:01:36 [9296] node04.dc.japannext.co.jp       crmd:     info: do_lrm_rsc_op:        Performing key=10:397:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04 op=pgsqlins_start_0
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp        cib:     info: cib_process_request:  Forwarding cib_modify operation for section status to all (origin=local/crmd/142)
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       Diff: --- 0.252.11 2
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       Diff: +++ 0.252.12 (null)
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       +  /cib:  @num_updates=12
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       +  /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='pgsqlins']/lrm_rsc_op[@id='pgsqlins_last_0']:  @operation_key=pgsqlins_start_0, @operation=start, @transition-key=12:397:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @transition-magic=-1:193;12:397:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @call-id=-1, @rc-code=193, @op-status=-1, @exec-time=0
Jun 28 13:01:36 [9293] node04.dc.japannext.co.jp       lrmd:     info: log_execute:  executing - rsc:pgsqlins action:start call_id:132
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp        cib:     info: cib_process_request:  Completed cib_modify operation for section status: OK (rc=0, origin=node03/crmd/957, version=0.252.12)
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       Diff: --- 0.252.12 2
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       Diff: +++ 0.252.13 (null)
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       +  /cib:  @num_updates=13
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp        cib:     info: cib_perform_op:       +  /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='pgsqlins']/lrm_rsc_op[@id='pgsqlins_last_0']:  @operation_key=pgsqlins_start_0, @operation=start, @transition-key=10:397:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @transition-magic=-1:193;10:397:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @call-id=-1, @rc-code=193, @op-status=-1, @last-run=1624852896, @last-rc-change=1624852896, @exec-time=0
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp        cib:     info: cib_process_request:  Completed cib_modify operation for section status: OK (rc=0, origin=node04/crmd/142, version=0.252.13)
Jun 28 13:01:37  pgsql11(pgsqlins)[9613]:    INFO: Set all nodes into async mode.
Jun 28 13:01:37  pgsql11(pgsqlins)[9613]:    INFO: PostgreSQL is down
Jun 28 13:01:37  pgsql11(pgsqlins)[9613]:    INFO: server starting
Jun 28 13:01:37  pgsql11(pgsqlins)[9613]:    INFO: PostgreSQL start command sent.
Jun 28 13:01:37  pgsql11(pgsqlins)[9613]:    WARNING: Can't get PostgreSQL recovery status. rc=2

我的猜测是,起搏器在从/var/lib/pacemaker/cib切换之前读取设置,并使用它来执行这些步骤。如有任何关于如何重置的帮助,我们将不胜感激。

  • 如起搏器问题中所述,在将node04置于非备用状态时,起搏器正在降级node03,并试图使node04为主。它将在该任务中失败,然后使node03成为独立的主机。

  • 由于我怀疑它是从cibpengine文件夹中挑选一些旧配置,我甚至销毁了两个节点上的集群,移除了起搏器、PC和corosync,并重新安装了所有这些。

  • 即便如此,这个问题仍然存在。然后我怀疑node04/var/lib/pgsql/文件夹的文件夹权限可能不对,于是开始探索它

  • 直到那时,我才意识到有一个旧的PGSQL.lock.bak文件,日期是6月11日,这意味着它比node03中当前的PGSQL.lock文件更旧,因此起搏器试图推广node04,但会失败。Pacemaker不会在任何日志中显示此错误。即使在crm_mon输出上也没有关于它的信息。一旦我删除了这个文件,它就像一个魅力。

TLDR

  • 检查/var/lib/pgsql/tmp文件夹中是否有任何PGSQL.lock.bak或任何其他不必要的文件,并在再次启动起搏器之前将其删除

相关内容

  • 没有找到相关文章

最新更新