LIMIT小的查询命中次数多会变慢



我有一个较慢的Postgres查询,可能需要更好的索引。但我缺乏索引的经验,所以我只是索引外键。我使用Postgres14.3和Django3.0。

这是一个缓慢的查询。只有当聊天室里有很多信息(比如10k以上(时,它才会很慢。

SELECT
chat_message.info_date_str,
auth_user.username,
chat_message.id,
chat_message.text_msg,
chat_message.type_message,
chat_message.url_file,
chat_message.url_file,
chat_message.was_removed,
chat_message.was_removed
FROM chat_message
INNER JOIN user_profile_userprofile
ON chat_message.owner_id = user_profile_userprofile.id
INNER JOIN auth_user
ON user_profile_userprofile.user_id = auth_user.id
WHERE chat_message.chat_room_id = 7204712
AND NOT chat_message.delete_by::text LIKE '%fattimita08%'
ORDER BY chat_message.id DESC
LIMIT 20

持续时间:11956.422

相关表格的定义:

CREATE TABLE public.chat_message (
id integer NOT NULL DEFAULT nextval('chat_message_id_seq'::regclass)
, chat_room_id integer NOT NULL
, owner_id integer NOT NULL
, info_date timestamp with time zone NOT NULL
, aud_msg character varying(500) NOT NULL
, vid_msg character varying(500) NOT NULL
, img_msg character varying(500) NOT NULL
, text_msg text NOT NULL
, type_message smallint NOT NULL
, delete_by character varying(50) NOT NULL
, url_file character varying(1000) NOT NULL
, receiver_id integer
, info_date_str character varying(200) NOT NULL DEFAULT '2019-11-28 23:43:44.578419'
, was_removed boolean NOT NULL DEFAULT false
, CONSTRAINT chat_message_pkey PRIMARY KEY (id)
, CONSTRAINT chat_messag_receiver_id_0eceddde_fk_user_profile_userprofile_id FOREIGN KEY (receiver_id) REFERENCES public.user_profile_userprofile (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED
, CONSTRAINT chat_message_chat_id_fkey FOREIGN KEY (chat_room_id) REFERENCES public.chat_chat (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
, CONSTRAINT chat_message_owner_id_fkey FOREIGN KEY (owner_id) REFERENCES public.user_profile_userprofile (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
);
CREATE INDEX chat_room_id_idx ON public.chat_message (chat_room_id);
CREATE INDEX owner_id_idx     ON public.chat_message (owner_id);
CREATE INDEX receiver_id_idx  ON public.chat_message (receiver_id);

CREATE TABLE public.user_profile_userprofile (
id integer NOT NULL DEFAULT nextval('user_profile_userprofile_id_seq'::regclass)
, user_id integer NOT NULL
, birthdate date
, profile_picture character varying(500) NOT NULL
, country character varying(500) NOT NULL
, phrase character varying(300) NOT NULL
, is_mod boolean NOT NULL
, ip inet, id_cel character varying(500) NOT NULL DEFAULT ''
, twitter_key character varying(500) DEFAULT ''
, twitter_secret character varying(500) DEFAULT ''
, gender character varying(500) NOT NULL DEFAULT ''
, lastseen character varying(500) NOT NULL DEFAULT ''
, profile_video character varying(500) NOT NULL DEFAULT ''
, status character varying(500) NOT NULL DEFAULT ''
, name character varying(500) NOT NULL
, "id_timeZone" character varying(500) NOT NULL
, profile_picture_thumb character varying(500) NOT NULL
, city character varying(500) NOT NULL
, CONSTRAINT user_profile_userprofile_pkey PRIMARY KEY (id)
, CONSTRAINT user_profile_userprofile_user_id_key UNIQUE (user_id)
, CONSTRAINT cascade_user FOREIGN KEY (user_id) REFERENCES public.auth_user (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE
);
CREATE UNIQUE INDEX userprofile_user_idx ON public.user_profile_userprofile (user_id);

CREATE TABLE public.auth_user (
id integer NOT NULL DEFAULT nextval('auth_user_id_seq'::regclass)
, password character varying(128) NOT NULL
, last_login timestamp with time zone
, is_superuser boolean NOT NULL
, username character varying(150) NOT NULL
, first_name character varying(30) NOT NULL
, last_name character varying(30) NOT NULL
, email character varying(254) NOT NULL
, is_staff boolean NOT NULL
, is_active boolean NOT NULL
, date_joined timestamp with time zone NOT NULL
, CONSTRAINT auth_user_pkey PRIMARY KEY (id)
, CONSTRAINT auth_user_username_key UNIQUE (username)
);
CREATE INDEX auth_user_username_like ON public.auth_user (username varchar_pattern_ops);
CREATE INDEX username_idx ON public.auth_user (username);

我用这个查询来分析:

EXPLAIN ANALYZE SELECT "chat_message"."info_date_str", "auth_user"."username", "chat_message"."id", "chat_message"."text_msg", "chat_message"."type_message", "chat_message"."url_file", "chat_message"."url_file", "chat_message"."was_removed", "chat_message"."was_removed" FROM "chat_message" INNER JOIN "user_profile_userprofile" ON ("chat_message"."owner_id" = "user_profile_userprofile"."id") INNER JOIN "auth_user" ON ("user_profile_userprofile"."user_id" = "auth_user"."id") WHERE ("chat_message"."chat_room_id" = 7204712 AND NOT ("chat_message"."delete_by"::text LIKE '%fattimita08%')) ORDER BY "chat_message"."id" DESC LIMIT 20
QUERY PLAN
Limit  (cost=33503.90..33614.03 rows=20 width=87) (actual time=5753.937..5754.447 rows=20 loops=1)
->  Nested Loop  (cost=33503.90..85612.86 rows=9463 width=87) (actual time=5753.935..5754.443 rows=20 loops=1)
->  Nested Loop  (cost=33503.48..63940.73 rows=9463 width=80) (actual time=5753.923..5754.399 rows=20 loops=1)
->  Gather Merge  (cost=33503.06..34605.18 rows=9463 width=80) (actual time=5753.893..5754.349 rows=20 loops=1)
Workers Planned: 2
Workers Launched: 0
->  Sort  (cost=32503.03..32512.89 rows=3943 width=80) (actual time=5753.535..5753.540 rows=20 loops=1)
Sort Key: chat_message.id DESC
Sort Method: quicksort  Memory: 2477kB
->  Parallel Bitmap Heap Scan on chat_message  (cost=129.77..32267.53 rows=3943 width=80) (actual time=81.934..5741.308 rows=10447 loops=1)
Recheck Cond: (chat_room_id = 7204712)
Filter: ((delete_by)::text !~~ '%fattimita08%'::text)
Rows Removed by Filter: 1
Heap Blocks: exact=8278
->  Bitmap Index Scan on chat_room_id_idx  (cost=0.00..127.41 rows=9463 width=0) (actual time=63.921..63.922 rows=10448 loops=1)
Index Cond: (chat_room_id = 7204712)
->  Memoize  (cost=0.43..4.95 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=20)
Cache Key: chat_message.owner_id
Cache Mode: logical
Hits: 18  Misses: 2  Evictions: 0  Overflows: 0  Memory Usage: 1kB
->  Index Scan using user_profile_userprofile_pkey on user_profile_userprofile  (cost=0.42..4.94 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=2)
Index Cond: (id = chat_message.owner_id)
->  Index Scan using auth_user_pkey on auth_user  (cost=0.42..2.29 rows=1 width=13) (actual time=0.002..0.002 rows=1 loops=20)
Index Cond: (id = user_profile_userprofile.user_id)
Planning Time: 43.104 ms
Execution Time: 5754.630 ms

此查询的主要困难在于它按chat_room_id进行筛选,但按id(降序(进行排序-在非常小的LIMIT之前。

Postgres希望过滤器chat_room_id = 7204712是选择性的,并从chat_room_id_idx上的位图堆扫描开始,找到大约10k行。

附加滤波器CCD_ 6似乎几乎没有选择性。(顺便说一句,该表达式中的强制转换(::text(没有任何用处,请删除它。(

这个多列索引应该会给你更快的结果,按数量级排列:

CREATE INDEX chat_message_your_name_here_idx ON public.chat_message (chat_room_id, id DESC);

现在,Postgres可以通过索引扫描遍历索引,并过滤少数不符合条件的行。

相关:

  • 使用小LIMIT优化查询,一列上有谓词,另一列按顺序排列

也就是说,Parallel Bitmap Heap Scan on chat_message承担了大部分成本——仅rows=10447的成本似乎过高。那张桌子特别大吗?你有像样的硬件吗?也许桌子太胀了?匹配的行是否实际分布在大桌子上?根据实际诊断,新索引上的VACUUM,甚至VACUUM FULL,或者更好的CLUSTER可能会有所帮助(很多(。类似:

CLUSTER public.chat_message USING chat_message_your_name_here_idx;

请注意,对于一张大桌子来说,这既昂贵又阻塞。有非阻塞性的替代方案。参见:

  • 优化Postgres对时间戳范围的查询

  • PostgreSQL索引未用于查询IP范围

  • 在一系列时间戳(两列(上优化查询

最新更新