在多个表上优化带有条件的查询



我有两个postgres表

<表类>idowner_idtbody><<tr>11002101

创建测试数据…

CREATE UNLOGGED TABLE a AS SELECT a_id, (random()*100000)::INTEGER owner_id
FROM generate_series(1,1000000) a_id;
CREATE UNLOGGED TABLE b AS SELECT b_id, (random()*100000)::INTEGER a_id, (random()*100000)::INTEGER user_id
FROM generate_series(1,10000000) b_id;
CREATE INDEX a_o ON a(owner_id);
CREATE INDEX b_a ON b(a_id);
CREATE INDEX b_u ON b(user_id);
ALTER TABLE a ADD PRIMARY KEY(a_id);
ALTER TABLE b ADD PRIMARY KEY(b_id);
VACUUM ANALYZE a,b;

第一个查询的问题是postgres不知道如何优化星型连接,所以我们不得不给它一点帮助。

WITH ids AS (
SELECT a_id FROM b WHERE user_id=201
UNION SELECT a_id FROM a WHERE owner_id=100
)
SELECT * FROM ids JOIN b USING (a_id) LIMIT 50;

这给出了一个使用两个索引的计划,它可能在你的情况下更快,也可能不是。

Limit  (cost=455.41..634.97 rows=50 width=12) (actual time=0.494..0.642 rows=50 loops=1)
->  Nested Loop  (cost=455.41..41596.19 rows=11456 width=12) (actual time=0.492..0.629 rows=50 loops=1)
->  HashAggregate  (cost=450.19..451.32 rows=113 width=4) (actual time=0.425..0.427 rows=1 loops=1)
Group Key: b_1.a_id
Batches: 1  Memory Usage: 24kB
->  Append  (cost=5.23..449.91 rows=113 width=4) (actual time=0.076..0.358 rows=98 loops=1)
->  Bitmap Heap Scan on b b_1  (cost=5.23..401.21 rows=102 width=4) (actual time=0.075..0.299 rows=92 loops=1)
Recheck Cond: (user_id = 201)
Heap Blocks: exact=92
->  Bitmap Index Scan on b_u  (cost=0.00..5.20 rows=102 width=0) (actual time=0.035..0.035 rows=92 loops=1)
Index Cond: (user_id = 201)
->  Bitmap Heap Scan on a  (cost=4.51..47.00 rows=11 width=4) (actual time=0.019..0.033 rows=6 loops=1)
Recheck Cond: (owner_id = 100)
Heap Blocks: exact=6
->  Bitmap Index Scan on a_o  (cost=0.00..4.51 rows=11 width=0) (actual time=0.014..0.014 rows=6 loops=1)
Index Cond: (owner_id = 100)
->  Bitmap Heap Scan on b  (cost=5.22..363.09 rows=101 width=12) (actual time=0.059..0.174 rows=50 loops=1)
Recheck Cond: (a_id = b_1.a_id)
Heap Blocks: exact=50
->  Bitmap Index Scan on b_a  (cost=0.00..5.19 rows=101 width=0) (actual time=0.023..0.023 rows=104 loops=1)
Index Cond: (a_id = b_1.a_id)
Planning Time: 0.448 ms
Execution Time: 0.747 ms

对于另一个查询,我必须运行这个:

select owner_id, user_id, count(*) from a join b using (a_id) group by owner_id,user_id order by count(*) desc limit 100;

获取一些user_id,owner_id,将实际返回结果从我的测试数据。然后,

EXPLAIN ANALYZE
SELECT b.*
FROM b JOIN a USING (a_id)
WHERE (b.user_id = 99238 AND a.owner_id = 58599)
OR (b.user_id = 36859 AND a.owner_id = 99027)
LIMIT 50;
Limit  (cost=24.97..532.32 rows=1 width=12) (actual time=0.274..0.982 rows=6 loops=1)
->  Nested Loop  (cost=24.97..532.32 rows=1 width=12) (actual time=0.271..0.976 rows=6 loops=1)
->  Bitmap Heap Scan on a  (cost=9.03..92.70 rows=22 width=8) (actual time=0.108..0.216 rows=12 loops=1)
Recheck Cond: ((owner_id = 58599) OR (owner_id = 99027))
Heap Blocks: exact=12
->  BitmapOr  (cost=9.03..9.03 rows=22 width=0) (actual time=0.086..0.088 rows=0 loops=1)
->  Bitmap Index Scan on a_o  (cost=0.00..4.51 rows=11 width=0) (actual time=0.064..0.065 rows=3 loops=1)
Index Cond: (owner_id = 58599)
->  Bitmap Index Scan on a_o  (cost=0.00..4.51 rows=11 width=0) (actual time=0.020..0.020 rows=9 loops=1)
Index Cond: (owner_id = 99027)
->  Bitmap Heap Scan on b  (cost=15.95..19.97 rows=1 width=12) (actual time=0.058..0.060 rows=0 loops=12)
Recheck Cond: ((a_id = a.a_id) AND ((user_id = 99238) OR (user_id = 36859)))
Filter: (((user_id = 99238) AND (a.owner_id = 58599)) OR ((user_id = 36859) AND (a.owner_id = 99027)))
Heap Blocks: exact=6
->  BitmapAnd  (cost=15.95..15.95 rows=1 width=0) (actual time=0.053..0.053 rows=0 loops=12)
->  Bitmap Index Scan on b_a  (cost=0.00..5.19 rows=101 width=0) (actual time=0.015..0.015 rows=50 loops=12)
Index Cond: (a_id = a.a_id)
->  BitmapOr  (cost=10.50..10.50 rows=205 width=0) (actual time=0.046..0.046 rows=0 loops=6)
->  Bitmap Index Scan on b_u  (cost=0.00..5.20 rows=102 width=0) (actual time=0.021..0.021 rows=121 loops=6)
Index Cond: (user_id = 99238)
->  Bitmap Index Scan on b_u  (cost=0.00..5.20 rows=102 width=0) (actual time=0.024..0.024 rows=105 loops=6)
Index Cond: (user_id = 36859)
Planning Time: 0.703 ms
Execution Time: 1.063 ms

它不像你的那样使用seq扫描,所以也许你有一个旧版本,不能正确优化这个?当行数估计非常准确时,它为表a选择了seq扫描,这是非常奇怪的。你应该调查一下,也许可以试试

SELECT * FROM a WHERE a.owner_id = 58599 OR a.owner_id = 99027
LIMIT 50;

this应该给出一个索引或位图索引扫描,如果它做了一个序列扫描,那么你有一个小的测试用例来找出原因。无论如何,你仍然可以强制使用索引:

EXPLAIN ANALYZE
WITH ids AS (
SELECT a_id FROM b WHERE user_id IN (99238,36859)
UNION SELECT a_id FROM a WHERE owner_id IN (58599,99027)
)
SELECT * FROM ids JOIN b USING (a_id) JOIN a USING (a_id)
WHERE (b.user_id = 99238 AND a.owner_id = 58599)
OR (b.user_id = 36859 AND a.owner_id = 99027);

…但它真的很丑。或者你可以单独处理Or中的每个子句,并对这个子句多次执行and,这也很难看:

EXPLAIN ANALYZE
SELECT a_id FROM b WHERE b.user_id = 99238 
INTERSECT
SELECT a_id FROM a WHERE a.owner_id = 58599
LIMIT 50;

如何优化大偏移

你没有,事实上,当使用大偏移量时,它通常暗示你做错了,通过重复执行相同的查询,例如分页,并显示大块的结果。有两种解决方案。如果获取结果的速度足够快,事务可以在此期间保持打开状态,那么可以为查询打开一个游标,不带LIMIT或OFFSET,并使用FETCH以块形式获取结果。否则,只执行一次没有LIMIT的查询,将结果存储在缓存中,并从中分页,而不重新执行查询。

使用UNION而不是OR:

SELECT * FROM ((SELECT b.id
FROM b JOIN a ON b.a_id = a.id
WHERE b.user_id = 201
LIMIT 50)
UNION
(SELECT b.id
FROM b JOIN a ON b.a_id = a.id
WHERE a.owner_id = 100
LIMIT 50)) AS q
LIMIT 50;

a(owner_id),a(id),b(user_id)b(a_id)的指标将使其快速。

最新更新