避免在联盟结果上"filesort"

子查询1:

SELECT * from big_table
where category = 'fruits' and name = 'apple'
order by yyyymmdd desc

解释：

table       |   key           |   extra
big_table   |   name_yyyymmdd |   using where

看起来很棒！

子查询2:

SELECT * from big_table
where category = 'fruits' and (taste = 'sweet' or wildcard = '*')
order by yyyymmdd desc

解释：

table       |   key               |   extra
big_table   |   category_yyyymmdd |   using where

看起来很棒！

现在，如果我将这些与UNION:结合起来

SELECT * from big_table
where category = 'fruits' and name = 'apple'
UNION
SELECT * from big_table
where category = 'fruits' and (taste = 'sweet' or wildcard = '*')
Order by yyyymmdd desc

解释：

table       |   key      |   extra
big_table   |   name     |   using index condition, using where
big_table   |   category |   using index condition
UNION RESULT|   NULL     |   using temporary; using filesort

不太好，它使用文件端口。

这是一个更复杂查询的精简版本，以下是关于big_table的一些事实：

big_table有10M行以上
有5个独特的"类别"
有5种独特的"味道"
大约有10000个独特的"名字"
大约有10000个唯一的"yyyymmdd"
我已经在每个字段上创建了一个索引，再加上yyyymmdd_category_taste_name之类的复合idx，但Mysql没有使用它

SELECT * FROM big_table
    WHERE category = 'fruits'
      AND (  name = 'apple'
          OR taste = 'sweet'
          OR wildcard = '*' )
    ORDER BY yyyymmdd DESC

并且使INDEX(catgory)或某个索引以category开始。然而，如果表的20%以上是category = 'fruits'，则可能会决定忽略索引并简单地进行表扫描。（既然你说只有5个类别，我怀疑优化器会正确地避开索引。）

或者这个可能是有益的：INDEX(category, yyyymmdd)，按照这个的顺序。

UNION必须进行排序（要么在磁盘上的内存中，它不清楚），因为它无法按所需顺序获取行。

复合索引INDEX(yyyymmdd, ...)可能用于避免"filesort"，但它不会使用yyyymmdd之后的任何列。

构造复合索引时，开始时将任何WHERE列比较为"="。之后，您可以添加一个范围或group by或order by。更多详细信息。

UNION通常是避免慢速OR的好选择，但在这种情况下，它需要三个索引

INDEX(category, name)
INDEX(category, taste)
INDEX(category, wildcard)

除非添加LIMIT，否则添加yyyymmdd将没有帮助。

查询将是：

( SELECT * FROM big_table WHERE category = 'fruits' AND name = 'apple' )
UNION DISTINCT
( SELECT * FROM big_table WHERE category = 'fruits' AND taste = 'sweet' )
UNION DISTINCT
( SELECT * FROM big_table WHERE category = 'fruits' AND wildcard = '*' )
ORDER BY yyyymmdd DESC

增加一个限制会更加混乱。首先在三个综合指数的端上加上yyyymmdd，然后加上

( SELECT ... ORDER BY yyyymmdd DESC LIMIT 10 )
UNION DISTINCT
( SELECT ... ORDER BY yyyymmdd DESC LIMIT 10 )
UNION DISTINCT
( SELECT ... ORDER BY yyyymmdd DESC LIMIT 10 )
ORDER BY yyyymmdd DESC  LIMIT 10

添加OFFSET会更糟。

另外两种技术——"覆盖"索引和"延迟查找"可能会有所帮助，但我对此表示怀疑

还有一种技术是将所有单词放在同一列中，并使用FULLTEXT索引。但由于几个原因，这可能是个问题。

这也必须在没有UNION 的情况下工作

SELECT * from big_table
where 
    ( category = 'fruits' and name = 'apple' )
    OR
    ( category = 'fruits' and (taste = 'sweet' or wildcard = '*')
ORDER BY yyyymmdd desc;

相关内容

最新更新

热门标签：