我需要在关联子查询中JOIN一个表。但是,postgres选择的查询计划很慢。如何优化以下查询:
SELECT c.id
FROM customer c
WHERE EXISTS (
SELECT 1
FROM customer_communication cc
JOIN communication co on co.id = cc.communication_id and co.channel <> 'mobile'
WHERE cc.user_id = c.id
)
这是EXPLAIN (ANALYZE)
的结果:
Nested Loop (cost=3451561.57..3539012.42 rows=24509 width=8) (actual time=60913.294..64056.970 rows=1036309 loops=1)
-> HashAggregate (cost=3451561.14..3451806.23 rows=24509 width=8) (actual time=60913.264..61187.702 rows=1036310 loops=1)
Group Key: cc.customer_id
-> Hash Join (cost=2070834.75..3358538.60 rows=37209016 width=8) (actual time=32758.325..52752.383 rows=37209019 loops=1)
Hash Cond: (cc.communication_id = co.id)
-> Seq Scan on customer_communication cc (cost=0.00..755689.16 rows=37209016 width=16) (actual time=0.011..4949.315 rows=37209019 loops=1)
-> Hash (cost=1772758.38..1772758.38 rows=18168430 width=8) (actual time=32756.662..32756.663 rows=18108924 loops=1)
Buckets: 262144 Batches: 128 Memory Usage: 7557kB
-> Seq Scan on communication co (cost=0.00..1772758.38 rows=18168430 width=8) (actual time=0.007..30024.494 rows=18108924 loops=1)
Filter: (channel <> 'mobile')
-> Index Only Scan using customerxpk on customer c (cost=0.43..3.60 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1036310)
Index Cond: (id = cc.customer_id)
Heap Fetches: 525050
Planning Time: 0.391 ms
Execution Time: 64094.584 ms
我认为您错误地指定了查询,因为您有冲突的别名。这样可能更好:
SELECT c.id
FROM customer c
WHERE EXISTS (SELECT 1
FROM customer_communication cc JOIN
communication co
ON co.id = cc.communication_id AND
co.channel <> 'mobile'
WHERE cc.user_id = c.id
);
注意子查询中c
指外查询,customer
和co
指communication
。