通过在大查询中插入日期记录来填充数据



我有一个大查询中缺少日期记录的表,我想先在表中插入缺少的日期记录,然后插入以前日期记录的分数,这在大查询中如何实现?

当前样品表视图:

Row   date      timeStamp   score   
1   2018-01-21  1516492800  0.44013312375
2   2018-01-22  1516579200  0.3821605743
3   2018-01-24  1516752000  0.3397971282666667

期望的表格视图:

Row   date      timeStamp   score   
1   2018-01-21  1516492800  0.44013312375
2   2018-01-22  1516579200  0.3821605743
3   2018-01-23  1516665600  0.3821605743
4   2018-01-24  1516752000  0.3397971282666667

基本上,由于日期2018-01-23缺少一条记录,我们插入了一条记录并从以前的日期中选择了分值。

下面是BigQuery标准SQL

#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2018-01-21' date, 1516492800 timeStamp, 0.44013312375 score UNION ALL
SELECT '2018-01-22', 1516579200, 0.3821605743 UNION ALL
SELECT '2018-01-24', 1516752000, 0.3397971282666667 
)
SELECT date, 
UNIX_SECONDS(TIMESTAMP(date)) AS timeStamp,
LAST_VALUE(score IGNORE NULLS) OVER(ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS score
FROM (
SELECT MIN(date) min_date, MAX(date) max_date
FROM `project.dataset.table`
), UNNEST(GENERATE_date_ARRAY(min_date, max_date)) date
LEFT JOIN `project.dataset.table`
USING(date)
ORDER BY date   

结果

Row date        timeStamp   score    
1   2018-01-21  1516492800  0.44013312375    
2   2018-01-22  1516579200  0.3821605743     
3   2018-01-23  1516665600  0.3821605743     
4   2018-01-24  1516752000  0.3397971282666667   

我只会为每行所需的值生成日期数组:

WITH t AS (
SELECT DATE '2018-01-21' date, 1516492800 timeStamp, 0.44013312375 score UNION ALL
SELECT '2018-01-22', 1516579200, 0.3821605743 UNION ALL
SELECT '2018-01-24', 1516752000, 0.3397971282666667 
)
SELECT dte, 
UNIX_SECONDS(TIMESTAMP(dte)) AS timeStamp,
t.score
FROM (SELECT t.*, DATE_ADD(LEAD(date) OVER (ORDER BY date), INTERVAL -1 day) as to_date
FROM t
) t CROSS JOIN
UNNEST(GENERATE_date_ARRAY(date, COALESCE(to_date, date))) dte
ORDER BY dte  ;

我敢肯定,与大量数据相比,它具有更好的性能特征。事实上,我昨天向一位同事提出了这样的方法,试图加快她的一个查询速度。

最新更新