我想加快下面的sql(成本为19685.75(。我可以为这个有多个复杂嵌套AND条件并在WHERE语句中与OR组合的sql建立索引吗?
SELECT DISTINCT
ON ("crawler_url"."url") U0."id"
FROM "characteristics_text" U0 LEFT OUTER
JOIN "characteristics_text_urls"
ON (U0."id" = "characteristics_text_urls"."text_id") LEFT OUTER
JOIN "crawler_url"
ON ("characteristics_text_urls"."url_id" = "crawler_url"."id")
WHERE (
(
U0."publication_date" BETWEEN '2018-01-01' AND '2018-12-31'
AND EXTRACT('month' FROM U0."publication_date") = 10
)
OR
(
U0."publication_date" IS NULL
AND U0."lastmod" BETWEEN '2018-01-01' AND '2018-12-31'
AND EXTRACT('month' FROM U0."lastmod") = 10
)
OR
(
U0."publication_date" IS NULL
AND U0."lastmod" IS NULL
AND U0."created_at" BETWEEN '2018-01-01 00:00:00+08:00' AND '2018-12-31 23:59:59.999999+08:00'
AND EXTRACT('month' FROM U0."created_at" AT TIME ZONE 'Asia/Hong_Kong') = 10
)
OR
(
U0."publication_date" >= '2018-08-01'
AND U0."publication_date" < '2018-10-31'
)
OR
(
U0."publication_date" IS NULL
AND U0."lastmod" >='2018-08-01'
AND U0."lastmod" < '2018-10-31'
)
OR
(
U0."publication_date" IS NULL
AND U0."lastmod" IS NULL
AND U0."created_at" >= '2018-07-31 16:00:00+00:00'
AND U0."created_at" < '2018-10-30 16:00:00+00:00'
)
)
ORDER BY "crawler_url"."url" ASC, U0."created_at" DESC
表格文本包含以下字段和索引(其他一些字段未显示(
Table "public.characteristics_text"
Column | Type | Modifiers
------------------------+--------------------------+-------------------------------------------------------------------
id | integer | not null default nextval('characteristics_text_id_seq'::regclass)
text | text |
created_at | timestamp with time zone | not null
lastmod | date |
publication_date | date |
Indexes:
"characteristics_text_pkey" PRIMARY KEY, btree (id)
"characteristics_text_fde81f11" btree (created_at)
"characteristics_text_lastmod_3bff34c2_uniq" btree (lastmod)
"characteristics_text_publication_date_772c1bda_uniq" btree (publication_date)
"characteristics_text_publication_date_c6311385_idx" btree (publication_date, lastmod, created_at)
我为created_at、lastmod和publication_date添加了三个单独的索引;以及用于这些字段的一个多列索引。
但是在postgres EXPAIN查询中,where子句仍然使用Seq-Scan,但不使用索引扫描。
-> Seq Scan on characteristics_text u0 (cost=0.00..19685.75 rows=14535 width=12)
Filter: (
(
(publication_date >= '2018-01-01'::date) AND
(publication_date <= '2018-12-31'::date) AND
(
date_part(
'month'::text, (publication_date)::timestamp without time zone
) = 10::double precision)
) OR
((publication_date IS NULL) AND (lastmod >= '2018-01-01'::date) AND (lastmod <= '2018-12-31'::date) AND (date_part('month'::text, (lastmod)::timestamp without time zone) = 10::double precision)) OR ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2017-12-31 16:00:00+00'::timestamp with time zone) AND (created_at <= '2018-12-31 15:59:59.999999+00'::timestamp with time zone) AND (date_part('month'::text, timezone('Asia/Hong_Kong'::text, created_at)) = 10::double precision)) OR ((publication_date >= '2018-08-01'::date) AND (publication_date < '2018-10-31'::date)) OR ((publication_date IS NULL) AND (lastmod >= '2018-08-01'::date) AND (lastmod < '2018-10-31'::date)) OR ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-07-31 16:00:00+00'::timestamp with time zone) AND (created_at < '2018-10-30 16:00:00+00'::timestamp with time zone))
)
我的问题是:
1。有可能让postgres对这个复杂的SELECT子句使用索引扫描吗
2.是否需要为每个AND子句创建一个多列索引?例如,在此原因中创建索引(publication_date, lastmod)
?
(
U0."publication_date" IS NULL
AND U0."lastmod" BETWEEN '2018-01-01' AND '2018-12-31'
AND EXTRACT('month' FROM U0."lastmod") = 10
)
- 索引是否适用于搜索IS NULL?搜索IS NULL的字段是否需要索引
更新日期:2018年11月4日
当我试图通过逐个测试字段来最小化查询时,字段publication_date
和last_mod
分别触发索引扫描,而created_at
不能:
是因为created_at
是时间戳吗?但为什么索引不适用于时间戳呢?
explain SELECT DISTINCT
ON ("crawler_url"."url") U0."id"
FROM "characteristics_text" U0 LEFT OUTER
JOIN "characteristics_text_urls"
ON (U0."id" = "characteristics_text_urls"."text_id") LEFT OUTER
JOIN "crawler_url"
ON ("characteristics_text_urls"."url_id" = "crawler_url"."id")
WHERE (
(
U0."created_at" BETWEEN '2018-01-01 00:00:00+08:00' AND '2018-12-31 23:59:59.999999+08:00'
AND EXTRACT('month' FROM U0."created_at" AT TIME ZONE 'Asia/Hong_Kong') = 10
)
)
ORDER BY "crawler_url"."url" ASC, U0."created_at" DESC
Unique (cost=18004.05..18006.01 rows=393 width=86)
-> Sort (cost=18004.05..18005.03 rows=393 width=86)
Sort Key: crawler_url.url, u0.created_at
-> Nested Loop Left Join (cost=0.71..17987.11 rows=393 width=86)
-> Nested Loop Left Join (cost=0.42..17842.25 rows=393 width=16)
-> Seq Scan on characteristics_text u0 (cost=0.00..15467.37 rows=393 width=12)
Filter: ((created_at >= '2017-12-31 16:00:00+00'::timestamp with time zone) AND (created_at <= '2018-12-31 15:59:59.999999+00'::timestamp with time zone) AND (date_part('month'::text, timezone('Asia/Hong_Kong'::text, created_at)) = 10::double precision))
-> Index Scan using characteristics_text_urls_65eb77fe on characteristics_text_urls (cost=0.42..6.03 rows=1 width=8)
Index Cond: (u0.id = text_id)
-> Index Scan using crawler_url_pkey on crawler_url (cost=0.29..0.36 rows=1 width=78)
Index Cond: (characteristics_text_urls.url_id = id)
publication_date
似乎触发索引扫描:
(
U0."publication_date" IS NULL
AND U0."lastmod" >='2018-08-01'
AND U0."lastmod" < '2018-10-31'
)
Unique (cost=17053.26..17085.63 rows=6473 width=86)
-> Sort (cost=17053.26..17069.44 rows=6473 width=86)
Sort Key: crawler_url.url, u0.created_at
-> Nested Loop Left Join (cost=11130.73..16643.51 rows=6473 width=86)
-> Hash Right Join (cost=11130.44..14257.63 rows=6473 width=16)
Hash Cond: (characteristics_text_urls.text_id = u0.id)
-> Seq Scan on characteristics_text_urls (cost=0.00..1858.01 rows=120601 width=8)
-> Hash (cost=11049.53..11049.53 rows=6473 width=12)
-> Bitmap Heap Scan on characteristics_text u0 (cost=186.95..11049.53 rows=6473 width=12)
Recheck Cond: ((publication_date IS NULL) AND (lastmod >= '2018-08-01'::date) AND (lastmod < '2018-10-31'::date))
-> Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx (cost=0.00..185.33 rows=6473 width=0)
Index Cond: ((publication_date IS NULL) AND (lastmod >= '2018-08-01'::date) AND (lastmod < '2018-10-31'::date))
-> Index Scan using crawler_url_pkey on crawler_url (cost=0.29..0.36 rows=1 width=78)
Index Cond: (characteristics_text_urls.url_id = id)
好的,一次全表扫描(seq_Scan(实际上可能比多次索引扫描更快。这取决于过滤条件的具体"选择性"。
首先,你的WHERE
子句有六个过滤条件,即OR
。这意味着,如果你想使用索引,PostgreSQL需要使用它6次,然后执行"索引OR"来合并结果。这可能不便宜。
因此,首先,你需要知道6种过滤条件中每一种的预期选择性是多少。也就是说,与表的总行数相比,选择了多少行。去做;几个简单的SQL查询就会给出答案。将答案张贴在此处。
现在,如果所有六种选择性的总和都超过5%,那么全表扫描(现在的算法(会更快。不要为索引而烦恼。
否则,以下索引可能会有所帮助:
create index ix1 on characteristics_text (
publication_date,
lastmod,
created_at,
1);
我怀疑你是否会得到一个有用的索引。您可能会考虑将这个查询分解为4或5个部分,然后使用UNION将结果重新粘在一起。(UNION将删除重复,而UNION ALL将返回所有行(。
UNION是一个相当昂贵的操作,因此需要考虑返回多少行。如果UNION删除了足够数量的行,那么使用索引可能会比UNION损失的效率更高。如果返回了许多行,那么您当前的表单就和它将要得到的一样好。
2018年全年100000条记录中有60%的记录,这使得数据库使用seq扫描。从全年的BEWEEN更改为仅一个月可以进行索引扫描。
AND U0."created_at" >= '2018-10-01 00:00:00+00:00'
AND U0."created_at" <= '2018-10-31 23:59:59.999999+00:00')
完整SQL:
SELECT DISTINCT
ON ("crawler_url"."url") U0."id"
FROM "characteristics_text" U0 LEFT OUTER
JOIN "characteristics_text_urls"
ON (U0."id" = "characteristics_text_urls"."text_id") LEFT OUTER
JOIN "crawler_url"
ON ("characteristics_text_urls"."url_id" = "crawler_url"."id")
WHERE (
(U0."publication_date" >= '2018-10-01'
AND U0."publication_date" <= '2018-11-01')
OR (U0."publication_date" IS NULL
AND U0."lastmod" >= '2018-10-01'
AND U0."lastmod" <= '2018-11-01'
)
OR
(U0."publication_date" IS NULL
AND U0."lastmod" IS NULL
AND U0."created_at" >= '2018-10-01 00:00:00+00:00'
AND U0."created_at" <= '2018-10-31 23:59:59.999999+00:00')
OR
(U0."publication_date" >= '2018-08-01'
AND U0."publication_date" < '2018-10-31')
OR
(U0."publication_date" IS NULL
AND U0."lastmod" >= '2018-08-01'
AND U0."lastmod" < '2018-10-31')
OR
(U0."publication_date" IS NULL
AND U0."lastmod" IS NULL
AND U0."created_at" >= '2018-07-31 16:00:00+00:00'
AND U0."created_at" < '2018-10-30 16:00:00+00:00')
)
ORDER BY "crawler_url"."url" ASC
EXPLAIN语句显示每个AND条件的索引扫描,所以总共有6个索引扫描。
Unique (cost=22885.16..22962.39 rows=15446 width=88)
-> Sort (cost=22885.16..22923.77 rows=15446 width=88)
Sort Key: crawler_url.url
-> Hash Right Join (cost=18669.29..21068.51 rows=15446 width=88)
Hash Cond: (crawler_url.id = characteristics_text_urls.url_id)
-> Seq Scan on crawler_url (cost=0.00..1691.88 rows=55288 width=88)
-> Hash (cost=18476.21..18476.21 rows=15446 width=8)
-> Hash Right Join (cost=14982.09..18476.21 rows=15446 width=8)
Hash Cond: (characteristics_text_urls.text_id = u0.id)
-> Seq Scan on characteristics_text_urls (cost=0.00..1907.25 rows=115525 width=8)
-> Hash (cost=14789.01..14789.01 rows=15446 width=4)
-> Bitmap Heap Scan on characteristics_text u0 (cost=516.57..14789.01 rows=15446 width=4)
Recheck Cond: (((publication_date >= '2018-10-01'::date) AND (publication_date <= '2018-11-01'::date)) OR ((publication_date IS NULL) AND (lastmod >= '2018-10-01'::date) AND (lastmod <= '2018-11-01'::date)) OR ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2018-10-31 23:59:59.999999+00'::timestamp with time zone)) OR ((publication_date >= '2018-08-01'::date) AND (publication_date < '2018-10-31'::date)) OR ((publication_date IS NULL) AND (lastmod >= '2018-08-01'::date) AND (lastmod < '2018-10-31'::date)) OR ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-07-31 16:00:00+00'::timestamp with time zone) AND (created_at < '2018-10-30 16:00:00+00'::timestamp with time zone)))
-> BitmapOr (cost=516.57..516.57 rows=16081 width=0)
-> Bitmap Index Scan on characteristics_text_publication_date_772c1bda_uniq (cost=0.00..4.53 rows=11 width=0)
Index Cond: ((publication_date >= '2018-10-01'::date) AND (publication_date <= '2018-11-01'::date))
-> Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx (cost=0.00..6.49 rows=166 width=0)
Index Cond: ((publication_date IS NULL) AND (lastmod >= '2018-10-01'::date) AND (lastmod <= '2018-11-01'::date))
-> Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx (cost=0.00..14.61 rows=413 width=0)
Index Cond: ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-10-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2018-10-31 23:59:59.999999+00'::timestamp with time zone))
-> Bitmap Index Scan on characteristics_text_publication_date_772c1bda_uniq (cost=0.00..74.61 rows=3419 width=0)
Index Cond: ((publication_date >= '2018-08-01'::date) AND (publication_date < '2018-10-31'::date))
-> Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx (cost=0.00..108.20 rows=3503 width=0)
Index Cond: ((publication_date IS NULL) AND (lastmod >= '2018-08-01'::date) AND (lastmod < '2018-10-31'::date))
-> Bitmap Index Scan on characteristics_text_publication_date_c6311385_idx (cost=0.00..284.95 rows=8569 width=0)
Index Cond: ((publication_date IS NULL) AND (lastmod IS NULL) AND (created_at >= '2018-07-31 16:00:00+00'::timestamp with time zone) AND (created_at < '2018-10-30 16:00:00+00'::timestamp with time zone))