用于为表编制索引的 OR 和 IN 运算符的替代方法

我正在处理的mysql查询如下：

select line_item_product_code, line_item_usage_start_date, sum(line_item_unblended_cost) as sum
from test_indexing
force index(date)
where line_item_product_code in('AmazonEC2', 'AmazonRDS')
and product_region='us-east-1'
and line_item_usage_start_date between date('2019-08-01')
and date('2019-08-31 00:00:00')
group by line_item_product_code, line_item_usage_start_date
order by sum;

我已经对列("line_item_usage_start_date"(应用了索引，但在运行查询时索引不起作用，并且在解释类型为"ALL"并且未使用键时。索引仅在子句采用"OR"或"IN"运算符时才有效。列的数据类型包括： line_item_product_code ：文本 line_item_unblended_cost ：双人间 product_region ：文本 line_item_usage_start_date ：时间戳我此查询的主要目标是：优化查询以在仪表板中快速响应，我有这个包含 192 列和 9m+ 行的表，csv 大小为 13+ GB。我想索引将解决我处理此查询的问题。是否有这些运营商的替代方案或任何其他解决方案？

x = 1  OR  x = 2

被优化器变成这样：

x IN (1,2)

date('2019-08-01')中不需要使用DATE()函数。字符串本身很好。为此：

and line_item_usage_start_date between date('2019-08-01')
AND date('2019-08-31 00:00:00')

我会写这个"范围"：

and line_item_usage_start_date >= '2019-08-01'
and line_item_usage_start_date  < '2019-08-01' + INTERVAL 1 MONTH

您在WHERE中有 3 个条件。使用构建索引

所有=测试，然后
任何 IN 测试，然后
最多一个"范围">

因此，这可能是最佳指数：

INDEX(product_region,    -- first, because of '='
line_item_product_code,
line_item_usage_start_date)  -- last

EXPLAIN可能会说Using temporary, Using filesort. 这些是由GROUP BY和ORDER BY. 尽管如此，关注GROUP BY的不同索引可能会消除一种：

INDEX(line_item_product_code, line_item_usage_start_date) -- same order as the GROUP BY

事实证明，我的第一个索引推荐肯定更好 - 因为它可以同时完成=和GROUP BY。

哎呀，有问题：

line_item_product_code ：文本

我怀疑"product_code"是否需要TEXT. 像VARCHAR(30)这样的东西不会很大吗？关键是，TEXT列不能在INDEX中使用。因此，还要更改该列的数据类型。

更多食谱： http://mysql.rjweb.org/doc.php/index_cookbook_mysql

我有这个 192 列的表格

这是相当大的。

不要使用FORCE INDEX-- 它今天可能会有所帮助，但明天当数据分布发生变化时会受到伤害。

相关内容

最新更新

热门标签：