postgreSQL: JOIN inside correlated subquery (SEMI JOIN)



我需要在关联子查询中JOIN一个表。但是,postgres选择的查询计划很慢。如何优化以下查询:

SELECT c.id
FROM customer c
WHERE EXISTS (
SELECT 1
FROM customer_communication cc
JOIN communication co on co.id = cc.communication_id and co.channel <> 'mobile'
WHERE cc.user_id = c.id
)

这是EXPLAIN (ANALYZE)的结果:

Nested Loop  (cost=3451561.57..3539012.42 rows=24509 width=8) (actual time=60913.294..64056.970 rows=1036309 loops=1)
->  HashAggregate  (cost=3451561.14..3451806.23 rows=24509 width=8) (actual time=60913.264..61187.702 rows=1036310 loops=1)
Group Key: cc.customer_id
->  Hash Join  (cost=2070834.75..3358538.60 rows=37209016 width=8) (actual time=32758.325..52752.383 rows=37209019 loops=1)
Hash Cond: (cc.communication_id = co.id)
->  Seq Scan on customer_communication cc  (cost=0.00..755689.16 rows=37209016 width=16) (actual time=0.011..4949.315 rows=37209019 loops=1)
->  Hash  (cost=1772758.38..1772758.38 rows=18168430 width=8) (actual time=32756.662..32756.663 rows=18108924 loops=1)
Buckets: 262144  Batches: 128  Memory Usage: 7557kB
->  Seq Scan on communication co  (cost=0.00..1772758.38 rows=18168430 width=8) (actual time=0.007..30024.494 rows=18108924 loops=1)
Filter: (channel <> 'mobile')
->  Index Only Scan using customerxpk on customer c  (cost=0.43..3.60 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1036310)
Index Cond: (id = cc.customer_id)
Heap Fetches: 525050
Planning Time: 0.391 ms
Execution Time: 64094.584 ms

我认为您错误地指定了查询,因为您有冲突的别名。这样可能更好:

SELECT c.id
FROM customer c
WHERE EXISTS (SELECT 1
FROM customer_communication cc JOIN
communication co
ON co.id = cc.communication_id AND
co.channel <> 'mobile'
WHERE cc.user_id = c.id
);

注意子查询中c外查询customercocommunication

最新更新