优化数据库(SQL)中大型表的查询



我正在尝试优化大型事件表(1000多万行(上的sql查询,以便进行日期范围搜索。我已经在这个表上有了唯一的索引(盖,做,测量,日期(。下面的查询试图在日期列中每隔2秒获得三种类型的测量(千瓦,电流和电压(的事件:

SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey 
from events 
WHERE lid = 1 
and did = 1
and measurement IN ("Voltage") 
group by timekey
UNION
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey 
from events
WHERE lid = 1
and did = 1
and measurement IN ("Current") 
group by timekey
UNION
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey 
from events
WHERE lid = 1
and did = 1
and measurement IN ("Kilowatts") 
group by timekey

这是我想要查找的表格。

=============================================================
id  |  lid   |   did   |   measurement  |  date 
=============================================================
1   |  1     |   1     |   Kilowatts    | 2020-04-27 00:00:00
=============================================================
2   |  1     |   1     |   Current      | 2020-04-27 00:00:00
=============================================================
3   |  1     |   1     |   Voltage      | 2020-04-27 00:00:00
=============================================================
4   |  1     |   1     |   Kilowatts    | 2020-04-27 00:00:01
=============================================================
5   |  1     |   1     |   Current      | 2020-04-27 00:00:01
=============================================================
6   |  1     |   1     |   Voltage      | 2020-04-27 00:00:01
=============================================================
7   |  1     |   1     |   Kilowatts    | 2020-04-27 00:00:02
=============================================================
8   |  1     |   1     |   Current      | 2020-04-27 00:00:02
=============================================================
9   |  1     |   1     |   Voltage      | 2020-04-27 00:00:02

预期结果是检索日期等于2020-04-27 00:00:00和2020-04-20 00:00:02的所有数据。上述提供的查询按预期工作。但我使用UNION在表上查找不同的测量值,我认为这可能不是最好的方法。

任何SQL专家都能帮我澄清一下我必须提高性能的问题吗?

每次测量每秒都有一条记录,并且您希望每两秒钟选择一条记录。

你可以试试:

select *
from events
where 
lid = 1 
and did = 1 
and measurement IN ('Voltage', 'Current')
and extract(second from date) % 2 = 0

这将选择具有偶数第二部分的记录。

或者,如果您总是每秒有一条记录,那么另一个选项是row_number()(这需要MySQL 8.0(:

select *
from (
select 
e.*, 
row_number() over(partition by measurement order by date) rn
from events
where 
lid = 1 
and did = 1 
and measurement IN ('Voltage', 'Current')
) t
where rn % 2 = 1

不过,这比之前的查询有点不准确。

您的查询实际上是三个查询合并为一个。幸运的是,它们都基于相似的列来选择数据行。如果你想让这个查询快速运行,你可以添加以下索引:

create index ix1 on events (lid, did, measurement);

除了上述建议外,更改PRIMARY KEY将为您带来更多性能:

PRIMARY KEY(lid, did, date, measurement)

并投掷CCD_ 3。

注意,如果两个读数在同一"秒"出现,可能会出现打嗝。如果一个读数在时钟滴答作响之后出现,而下一个读数则在下一个滴答作响之前出现,那么这种情况很容易发生。

最新更新