经过各种测试,我发现这些相似的查询在执行时间上存在巨大差异:
select id from my_table where creation_date > (current_timestamp - interval '2 days');
-- 9s Plan: partition pruning parallel scan of 5 partitions
select id from my_table where creation_date > (select current_timestamp - interval '2 days');
-- 4.5s Plan: seq scan of all partitions
select id from my_table where creation_date > (localtimestamp - interval '2 days');
-- 7s Plan: partition pruning parallel scan of the 5 last partitions
select id from my_table where creation_date > (select localtimestamp - interval '2 days');
-- 2.5s Plan: seq scan of all partitions
select id from my_table where creation_date > '2022-11-23 11:20';
-- 2.5s Plan: partition pruning parallel scan of the 5 last partitions
create_date是时间戳类型的分区键(不带时区)。我猜是类型转换的问题。当where子句中有select时,计划将扫描所有分区。它表现得好像无法猜测选择的结果。但这比没有select要快得多。当没有select时,计划会删除不需要的分区,只对剩余的分区执行seqscan,但这要慢得多。
然后我在同一个表上启动相同的查询,但是通过来自另一个数据库的外表(在相同的测试实例上)
select id from my_foreign_table where creation_date > (current_timestamp - interval '2 days');
-- too long (more than 10min)
select id from my_foreign_table where creation_date > (select current_timestamp - interval '2 days');
-- too long (more than 10min)
select id from my_foreign_table where creation_date > (localtimestamp - interval '2 days');
-- too long (more than 10min)
select id from my_foreign_table where creation_date > (select localtimestamp - interval '2 days');
-- 2.5s
select id from my_foreign_table where creation_date > '2022-11-23 11:20';
-- 2.6s
有人能解释一下为什么这些查询之间有这么大的差异吗?下面发生了什么?
where子句中没有select的查询执行计划(分区修剪,并行扫描):
QUERY PLAN |
---------------------------------------------------------------------------------------------------------------+
Gather (cost=1000.00..62310996.53 rows=1244782 width=8) |
Workers Planned: 8 |
-> Parallel Append (cost=0.00..62185518.33 rows=155598 width=8) |
Subplans Removed: 175 |
-> Parallel Seq Scan on data_container_archive_2022_11 dca (cost=0.00..820761.36 rows=169177 width=8)|
Filter: (partition_key > (CURRENT_TIMESTAMP - '2 days'::interval)) |
-> Parallel Seq Scan on data_container_archive_2022_12 dca_1 (cost=0.00..12.88 rows=55 width=8) |
Filter: (partition_key > (CURRENT_TIMESTAMP - '2 days'::interval)) |
-> Parallel Seq Scan on data_container_archive_2023_01 dca_2 (cost=0.00..12.88 rows=55 width=8) |
Filter: (partition_key > (CURRENT_TIMESTAMP - '2 days'::interval)) |
-> Parallel Seq Scan on data_container_archive_2023_02 dca_3 (cost=0.00..12.88 rows=55 width=8) |
Filter: (partition_key > (CURRENT_TIMESTAMP - '2 days'::interval)) |
-> Parallel Seq Scan on data_container_archive_2023_03 dca_4 (cost=0.00..12.88 rows=55 width=8) |
Filter: (partition_key > (CURRENT_TIMESTAMP - '2 days'::interval)) |
使用select in where子句(所有分区,seq scan)查询的执行计划:
QUERY PLAN |
-------------------------------------------------------------------------------------------------------+
Append (cost=0.01..88407378.48 rows=775958175 width=8) |
InitPlan 1 (returns $0) |
-> Result (cost=0.00..0.01 rows=1 width=8) |
-> Seq Scan on data_container_archive_2008_04 dca (cost=0.00..1427.44 rows=13078 width=8) |
Filter: (partition_key > $0) |
-> Seq Scan on data_container_archive_2008_05 dca_1 (cost=0.00..1392.58 rows=12895 width=8) |
Filter: (partition_key > $0) |
-> Seq Scan on data_container_archive_2008_06 dca_2 (cost=0.00..376.65 rows=3457 width=8) |
Filter: (partition_key > $0) |
-> Seq Scan on data_container_archive_2008_07 dca_3 (cost=0.00..1638.68 rows=14685 width=8) |
Filter: (partition_key > $0) |
-> Seq Scan on data_container_archive_2008_08 dca_4 (cost=0.00..2127.53 rows=19001 width=8) |
Filter: (partition_key > $0) |
-> Seq Scan on data_container_archive_2008_09 dca_5 (cost=0.00..3121.90 rows=27971 width=8) |
Filter: (partition_key > $0) |
-> Seq Scan on data_container_archive_2008_10 dca_6 (cost=0.00..3147.60 rows=28576 width=8) |
Filter: (partition_key > $0) |
-> Seq Scan on data_container_archive_2008_11 dca_7 (cost=0.00..4151.62 rows=37297 width=8) |
Filter: (partition_key > $0) |
-> Seq Scan on data_container_archive_2008_12 dca_8 (cost=0.00..5287.16 rows=47551 width=8) |
Filter: (partition_key > $0) |
-> Seq Scan on data_container_archive_2009_01 dca_9 (cost=0.00..7169.21 rows=64752 width=8) |
Filter: (partition_key > $0) |
...
注意:这里的表名和列名都是真实的。
在与子查询结果比较的查询中,PostgreSQL不会" pull up ";子查询,因此在执行开始时它不知道子查询的结果。因此,它不会修剪分区。
是的,PostgreSQL优化器可能会变得更聪明,并认识到它可以拉出子查询,但这将花费额外的处理时间,这将损害其他查询。
总之:
localtimestamp是系统时钟值(因此没有计算)和
current_timestamp是时区感知对象,这需要查找操作系统的时区设置(在罚款)。
这是一篇很好的文章:https://www.commandprompt.com/education/postgresql-current_timestamp-vs-localtimestamp/