带子查询的CTE查询在小索引表上速度慢;如何在MySQL上进行优化



我最近将一个复杂的查询重写为带有子查询的CTE查询,部分原因是为了了解更多关于窗口函数的信息。新的速度很慢;对于只有几千行的索引表,在快速硬件上大约需要.5s。我不知道如何解释EXPLAIN;这似乎表明,正如我所期望的,行不多,而且有索引。我需要在这里做什么?

这是在MySQL上运行的;这个查询是由SQLAlchemy生成的,但我最初是用SQL手工编写它的核心,然后才移植过来。这样做的目的是在单词列表中找到附近的单词,并排除一些相关主题。

WITH rowlist AS (
SELECT
word.id AS id,
word.word AS word,
word.part_of_speech AS part_of_speech,
row_number() OVER (
ORDER BY
word.stripped_word,
(
SELECT
min(quotations.date) AS min_1
FROM
quotations
WHERE
quotations.word_id = word.id
)
) AS rownumber
FROM
word
WHERE
word.deleted IS NULL
AND NOT (
EXISTS (
SELECT
1
FROM
word_subject AS word_subject_1,
word_subject
WHERE
word_subject_1.word_id = word.id
AND word_subject_1.subject_id = 12
)
)
)
SELECT
rowlist.rownumber AS rowlist_rownumber
FROM
rowlist
WHERE
rowlist.id = 392

这方面的EXPLAIN是:

*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1543
Extra: Using where
*************************** 2. row ***************************
id: 2
select_type: DERIVED
table: word
type: ref
possible_keys: deleted
key: deleted
key_len: 6
ref: const
rows: 1543
Extra: Using index condition; Using where; Using temporary
*************************** 3. row ***************************
id: 4
select_type: MATERIALIZED
table: word_subject_1
type: ref
possible_keys: word_id,subject_id
key: subject_id
key_len: 4
ref: const
rows: 503
Extra: Using index
*************************** 4. row ***************************
id: 4
select_type: MATERIALIZED
table: word_subject
type: index
possible_keys: NULL
key: word_id
key_len: 8
ref: NULL
rows: 2415
Extra: Using index; Using join buffer (flat, BNL join)
*************************** 5. row ***************************
id: 3
select_type: DEPENDENT SUBQUERY
table: quotations
type: ref
possible_keys: word_id
key: word_id
key_len: 5
ref: db.word.id
rows: 4
Extra: 
5 rows in set (0.001 sec)

您的子查询包含以下内容:

FROM
word_subject AS word_subject_1,
word_subject

但是word_subject_1和word_subject之间没有联接条件。因此它是笛卡尔乘积。您可以在EXPLAIN中看到报告为具有未索引联接"的索引扫描(实际上与表扫描相同(;使用联接缓冲区";。

我认为您根本不需要在子查询中加入该联接。事实上,您也不需要相关的子查询。您可以在CTE中作为排除联接来执行此操作,而无需使用子查询:

FROM
word
LEFT OUTER JOIN word_subject
ON word.id = word_subject.word_id AND word_subject.subject_id = 12
WHERE
word.deleted IS NULL
and word_subject.word_id IS NULL

请确保在word_subject:(word_id, subject_id)中的这对列上有一个复合索引。

最新更新