我有一个这样的表:
+----+---------+-------+---------------+------------+--------------+-------------+----------------+
| id | action | group | book | book_type | book_per_row | book_status | time |
+----+---------+-------+---------------+------------+--------------+-------------+----------------+
| 1 | welcome | hello | null | null | null | null | 6/27/2022 0:10 |
| 1 | welcome | hello | null | null | null | null | 6/28/2022 0:10 |
| 1 | welcome | hello | null | null | null | null | 6/29/2022 0:10 |
| 1 | buy | order | little prince | literature | 1 | bought | 6/29/2022 0:10 |
| 1 | buy | order | null | null | null | null | 6/30/2022 0:10 |
| 1 | buy | order | null | null | null | null | 7/1/2022 0:10 |
| 1 | buy | order | null | null | null | null | 7/2/2022 0:10 |
| 1 | buy | order | null | null | null | null | 7/3/2022 0:10 |
| 1 | search | order | little prince | literature | 0 | bought | 7/4/2022 0:10 |
| 1 | search | order | null | null | null | null | 7/5/2022 0:10 |
| 1 | search | order | null | null | null | null | 7/6/2022 0:10 |
| 1 | auth | auth | null | null | null | null | 7/6/2022 0:10 |
| 1 | auth | auth | null | null | null | null | 7/7/2022 0:11 |
| 1 | auth | auth | null | null | null | null | 7/7/2022 0:11 |
| 1 | buy | order | null | null | null | null | 7/7/2022 0:11 |
| 1 | buy | order | null | null | null | null | 7/7/2022 0:11 |
| 1 | auth | auth | null | null | null | null | 7/7/2022 0:11 |
| 1 | auth | auth | null | null | null | null | 7/7/2022 0:11 |
| 1 | exit | exit | null | null | null | null | 7/7/2022 0:13 |
| 1 | exit | exit | null | null | null | null | 7/7/2022 0:13 |
+----+---------+-------+---------------+------------+--------------+-------------+----------------+
我想让它像这样(当书之间有一个空,我想用书的名字,book_type, book_status来代替它):
+----+---------+-------+---------------+------------+--------------+-------------+----------------+
| id | action | group | book | book_type | book_per_row | book_status | time |
+----+---------+-------+---------------+------------+--------------+-------------+----------------+
| 1 | welcome | hello | null | null | null | null | 6/27/2022 0:10 |
| 1 | welcome | hello | null | null | null | null | 6/28/2022 0:10 |
| 1 | welcome | hello | null | null | null | null | 6/29/2022 0:10 |
| 1 | buy | order | little prince | literature | 1 | bought | 6/29/2022 0:10 |
| 1 | buy | order | little prince | literature | 0 | bought | 6/30/2022 0:10 |
| 1 | buy | order | little prince | literature | 0 | bought | 7/1/2022 0:10 |
| 1 | buy | order | little prince | literature | 0 | bought | 7/2/2022 0:10 |
| 1 | buy | order | little prince | literature | 0 | bought | 7/3/2022 0:10 |
| 1 | search | order | little prince | literature | 0 | bought | 7/4/2022 0:10 |
| 1 | search | order | null | null | null | null | 7/5/2022 0:10 |
| 1 | search | order | null | null | null | null | 7/6/2022 0:10 |
| 1 | auth | auth | null | null | null | null | 7/6/2022 0:10 |
| 1 | auth | auth | null | null | null | null | 7/7/2022 0:11 |
| 1 | auth | auth | null | null | null | null | 7/7/2022 0:11 |
| 1 | buy | order | null | null | null | null | 7/7/2022 0:11 |
| 1 | buy | order | null | null | null | null | 7/7/2022 0:11 |
| 1 | auth | auth | null | null | null | null | 7/7/2022 0:11 |
| 1 | auth | auth | null | null | null | null | 7/7/2022 0:11 |
| 1 | exit | exit | null | null | null | null | 7/7/2022 0:13 |
| 1 | exit | exit | null | null | null | null | 7/7/2022 0:13 |
+----+---------+-------+---------------+------------+--------------+-------------+----------------+
我试过这样做:
select id,action,group,
case when book is null then lag(book)over(partition by id order by time asc) as book else book end as book,
case when book_type is null then lag(book_type)over(partition by id order by time asc) as book_type else book_type end as book,
case when book_status is null then lag(book_status)over(partition by id order by time asc) as book_status else book_status end as book,
time
from table.
但是我只能填充在book, book type, book_per_row, book_status中有值的行。
您在窗口上使用LAG
的想法是正确的,但是错误的功能。
我用snowflake的latest
函数生成了这个sql,我相信它与impala有相同的窗口函数LAST_VALUE
。
SELECT
id,
action,
group,
time,
LAST_VALUE(book ignore NULLS) OVER (
PARTITION BY id,
action,
group
ORDER BY
time ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS LATEST_book,
LAST_VALUE(book_type ignore NULLS) OVER (
PARTITION BY id,
action,
group
ORDER BY
time ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS LATEST_book_type,
LAST_VALUE(book_per_row ignore NULLS) OVER (
PARTITION BY id,
action,
group
ORDER BY
time ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS LATEST_book_per_row,
LAST_VALUE(book_status ignore NULLS) OVER (
PARTITION BY id,
action,
group
ORDER BY
time ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS LATEST_book_status,
FROM
table
我目前没有访问Impala,所以我希望这对你有用!