PGPOOL日志错误:主0仅拥有1个备用1



我有PGPOOL -II v4.0.2在管理两个PostgreSQL 11.2数据库节点(0-主和1-备用)上运行的PGPOOL -II v4.0.2。psql -c'show pool_nodes'命令显示两个节点都在启动:

$ psql -c 'show pool_nodes'
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | last_status_change 
---------+-----------+------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
 0 | 10.1.1.20 | 5432 | up | 0.500000 | primary | 0 | true | 0 | 2019-02-22 21:37:55
 1 | 10.1.3.20 | 5432 | up | 0.500000 | standby | 0 | false | 0 | 2019-02-22 21:44:49

但是,在测试在线恢复和后备过程之后,我今天开始看到每10秒写入该错误:

$ sudo -u postgres pgpool -n
---
2019-02-22 22:15:47: pid 22204: LOG: Setting up socket for 0.0.0.0:5432
2019-02-22 22:15:47: pid 22204: LOG: Setting up socket for :::5432
2019-02-22 22:15:47: pid 22204: LOG: find_primary_node_repeatedly: waiting for finding a primary node
2019-02-22 22:15:47: pid 22204: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2019-02-22 22:15:47: pid 22204: LOG: find_primary_node: primary node is 0
2019-02-22 22:15:47: pid 22204: LOG: find_primary_node: standby node is 1
2019-02-22 22:15:47: pid 22204: LOG: pgpool-II successfully started. version 4.0.2 (torokiboshi)
2019-02-22 22:15:47: pid 22204: LOG: node status[0]: 1
2019-02-22 22:15:47: pid 22204: LOG: node status[1]: 2
2019-02-22 22:15:47: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
2019-02-22 22:15:57: pid 22238: LOG: verify_backend_node_status: primary 0 owns only 0 standbys out of 1
...
(repeats each 10 seconds)

我无法弄清楚这意味着什么以及如何修复它。PostgreSQL流复制似乎正常工作。那是当我创建和填充主题上的测试表的时候,我可以在待机上看到同样的情况。

我想在哪里可以看?谢谢。

更新2/23/19:

我将其缩小了一些。看来,尽管我的后端节点在流复制中运行,但以下查询返回null(https://github.com/beocommedia/pgpool-ii/pgpool-ii/blob/4.0.2/src/src/main/pgpgpool/pgpool_main.cpool_main.c#l3265---3265------2265------------l3265-------------------------------------------------------------------------------------------------参差参照。L3268):

2019-02-23 15:32:32: pid 17383: DEBUG: verify_backend_node_status: pg_stat_wal_receiver status for standby 1 is NULL
2019-02-23 15:32:32: pid 17383: LOCATION: pgpool_main.c:3271

如果我通过pgpool查询相同的待机节点,我将获得状态:

$ psql -x -h 10.1.3.20 -c "SELECT status, conninfo FROM pg_stat_wal_receiver";
-[ RECORD 1 ]----------------------------------------------------------------
status   | streaming
conninfo | user=repl passfile=/var/lib/postgresql/.pgpass dbname=replication 
           host=10.1.1.20 port=5432 fallback_application_name=walreceiver
           sslmode=prefer sslcompression=0 krbsrvname=postgres
           target_session_attrs=any

知道我在这里可能缺少什么?

好。我终于能够解决这个问题。问题在于sr_check_user= 'pgpool'用户没有查询pg_stat_wal_receiver视图的权限。在SELECT中通过PGPOOL用户名后,显而易见:

$ psql -x -h [standby.db.node.ip] -U pgpool -d postgres -c "SELECT status, conninfo FROM pg_stat_wal_receiver";

可以通过使PGPOOL用户的一部分" PG_Monitor"角色:

解决问题。
GRANT pg_monitor to pgpool;

我已经要求PGPool维护者更新以下文档:http://www.pgpool.net/docs/latest/en/html/example-cluster.html

最新更新