在同一列上具有多个BETWEEN条件的最佳方式



是否可以优化在同一列上具有多个BETWEEN条件的BigQuery查询时间复杂性?

查询示例:

SELECT time, value
FROM `table`
WHERE ((time between "2019-04-06 20:40:00" AND "2019-04-07 21:00:00")
OR (time between "2020-05-04 17:12:06" AND "2020-05-04 17:37:36")
OR (time between "2019-01-16 15:10:00" AND "2019-01-16 15:45:00")
OR (time between "2020-04-09 20:55:36" AND "2020-04-09 21:01:36")
OR (time between "2019-07-08 10:05:00" AND "2019-07-08 10:45:00")
OR (time between "2021-06-07 20:02:32" AND "2021-06-07 20:30:56")
OR (time between "2019-11-03 14:55:20" AND "2019-11-03 15:17:24"))

在实际查询中有更多的between条件(大约60(。

完整的表格包含数据的>3TB

考虑以下方法

WITH time_ranges as (
select timestamp '2019-04-06 20:40:00' start_time, timestamp '2019-04-07 21:00:00' end_time union all
select '2020-05-04 17:12:06', '2020-05-04 17:37:36' union all
select '2019-01-16 15:10:00', '2019-01-16 15:45:00' union all
select '2020-04-09 20:55:36', '2020-04-09 21:01:36' union all
select '2019-07-08 10:05:00', '2019-07-08 10:45:00' union all
select '2021-06-07 20:02:32', '2021-06-07 20:30:56' union all
select '2019-11-03 14:55:20', '2019-11-03 15:17:24'     
)
SELECT time, value
FROM your_table
WHERE (
SELECT LOGICAL_OR(time between start_time and end_time)
FROM time_ranges
)

您能将标准保存到表中吗?

如果你的时间窗口不重叠,你可以尝试交叉连接和过滤:

select time, value
from your_table
cross join time_windows
where time between start_time and end_time

其中time_windows类似

start_time,            end_time
'2019-04-06 20:40:00' | '2019-04-07 21:00:00'
'2020-05-04 17:12:06' | '2020-05-04 17:37:36'
etc...

两个字段都已保存为时间戳。

如果这是查询的实际复杂性,那么针对3TB表的交叉联接应该相当快。添加其他联接和筛选器也可能会降低速度。

此外,如果您能够修改要在time列上分区的表,BQ可能能够进一步优化您的查询。

最新更新