pg_dump Crashing PostgreSQL Server



在包含大量blob的数据库上运行pg_dump,当执行此查询时,PostgreSQL崩溃:

pg_dump: reading large objects
pg_dump: error: query failed: SSL SYSCALL error: EOF detected
pg_dump: error: query was: SELECT l.oid, (SELECT rolname FROM pg_catalog.pg_roles WHERE oid = l.lomowner) AS rolname, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(l.lomacl,pg_catalog.acldefault('L',l.lomowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('L',l.lomowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) AS lomacl, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('L',l.lomowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(l.lomacl,pg_catalog.acldefault('L',l.lomowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) AS rlomacl, NULL AS initlomacl, NULL AS initrlomacl FROM pg_largeobject_metadata l LEFT JOIN pg_init_privs pip ON (l.oid = pip.objoid AND pip.classoid = 'pg_largeobject'::regclass AND pip.objsubid = 0)

我对lomacl的查询和两个array_agg()列进行了实验和rlomacl似乎是罪魁祸首。

它是AWS Aurora PostgreSQL 11:

SELECT version();
version
-------------------------------------------------------------------------------------------------
PostgreSQL 11.9 on x86_64-pc-linux-gnu, compiled by x86_64-pc-linux-gnu-gcc (GCC) 7.4.0, 64-bit

日志:

2021-08-19 19:47:46 UTC::@:[46753]:LOG: server process (PID 21837) was terminated by signal 9: Killed
2021-08-19 19:47:46 UTC::@:[46753]:DETAIL: Failed process was running: SELECT l.oid, (SELECT rolname FROM pg_catalog.pg_roles WHERE oid = l.lomowner) AS rolname, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(l.lomacl,pg_catalog.acldefault('L',l.lomowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('L',l.lomowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) AS lomacl, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('L',l.lomowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(l.lomacl,pg_catalog.acldefault('L',l.lomowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) AS rlomacl, NULL AS initlomacl, NULL AS initrlomacl FROM pg_largeobject_metadata l LEFT JOIN pg_init_privs pip ON (l.oid = pip.objoid AND pip.classoid = 'pg_largeobject'::regclass AND pip.objsubid = 0)
2021-08-19 19:47:46 UTC::@:[46753]:LOG: terminating any other active server processes
2021-08-19 19:47:46 UTC::@:[46753]:FATAL: Can't handle storage runtime process crash
2021-08-19 19:47:46 UTC::@:[46753]:LOG: database system is shut down

有任何故障排除步骤/建议吗?

pg_dump使用pg_largeobject_metadatapg_init_privs之间的连接来获取要转储的大对象oid列表。

现在,要么是数据库服务器上的内存太少,要么是您有许多大对象,并且您的work_mem设置得非常高,因此数据库服务器机器耗尽了内存。由于您没有在数据库服务器操作系统上禁用内存超提交,因此OOM杀手会杀死您的进程。

要么增加可用内存,要么使用更保守的work_mem设置。我必须补充一点,在v13之前,PostgreSQL容易错误地创建比work_mem大的哈希值。也许您可以在转储期间将enable_hashjoin设置为off

最新更新