在PostgreSQL中对索引进行聚类是否利用了预先排序的数据

我创建一个这样的表：

SELECT t1.c1, t2.c2, t3.c3, *several more columns*
INTO t4
FROM t1
INNER JOIN t2 ON t1.j2 = t2.j2
INNER JOIN t3 ON t1.j3 = t3.j3;

然后，我创建一个集群主键：

ALTER TABLE t4 ADD CONSTRAINT pk_t4 PRIMARY KEY (c1, c2, c3);
CLUSTER t4 USING pk_t4;

如果我向SELECT INTO查询添加 ORDER BY c1, c2, c3 子句，这会加快主键的聚类吗？

如果你正在创建一个带有SELECT ... INTO或CREATE TABLE AS SELECT ...的新表，PostgreSQL将按顺序插入记录。

因此，是的，如果您添加一个 ORDER BY c1, c2, c3 ，这也是一个主键，它们已经集群了，因此不需要CLUSTER.

但是，如果您再次运行集群，我认为PostgreSQL将重写该表。

例

首先生成一个包含 500 万个整数的表，顺序为：

testdb=> create table clust as select a from generate_series(1, 5000000) a order by random() ;
SELECT 5000000
Time: 14675,540 ms
testdb=> create index clust_a_idx on clust (a);
CREATE INDEX
Time: 13145,245 ms
testdb=> cluster clust using clust_a_idx;
CLUSTER
Time: 19126,597 ms
testdb=> cluster clust using clust_a_idx;
CLUSTER
Time: 7968,350 ms

第一次聚类需要 19 秒，第二次需要 7.9 秒。

创建另一个表，这次已经订购：

testdb=> create table clust2 as select a from generate_series(1, 5000000) a ;
SELECT 5000000
Time: 2612,878 ms
testdb=> create index clust2_a_idx on clust2 (a);
CREATE INDEX
Time: 6816,040 ms
testdb=> cluster clust2 using clust2_a_idx;
CLUSTER
Time: 7762,115 ms
testdb=> cluster clust2 using clust2_a_idx;
CLUSTER
Time: 7861,405 ms

对已订购的表进行聚类大约需要 7.8 秒。

ORDER BY c1, c2, c3有帮助吗？是的。

但是，如果以正确的顺序插入，则表已经排序（群集），并且CLUSTER是多余的。

相关内容

最新更新

热门标签：