我有一个包含user_id、col1、col2、col3、updated_at、is_deleted、day等字段的表。
当前查询如下所示-
SELECT DISTINCT
user_id,
first_value(col1) ignore nulls OVER (partition BY user_id
ORDER BY
updated_at DESC rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following) AS col1,
first_value(col2) ignore nulls OVER (partition BY user_id
ORDER BY
updated_at DESC rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following) AS col2,
first_value(col3) ignore nulls OVER (partition BY user_id
ORDER BY
updated_at DESC rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following) AS col3,
bool_or(is_deleted) ignore nulls OVER (partition BY user_id
ORDER BY
updated_at DESC rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following) AS is_deleted
FROM
my_table
WHERE
day >= '2021-05-25'
基本上,我想要每个用户id的每个列的最新(第一个(值。由于每个值列都可以为null,我不得不多次(为每个列(运行相同的窗口查询。目前,66%的时间都花在了开窗上。有什么优化的方法吗?
看起来你想要这个:
select * from (
select * , row_number() over (partition by user_id ORDER BY updated_at DESC) rn
from my_table
where day >= '2021-05-25'
) t
where rn = 1