我有一个包含三列的下表:Id、时间戳、事实
ID | 时间戳 | 事实|
---|---|---|
1 | 2021-10-25 11:21:12 | false|
2 | 2021-10-14 18:49:25 | false|
2 | 2021-11-03 12:48:47 | 真 |
2 | 2021-11-08 23:15:12 | 错误[/tr>|
2 | 2021-11-08 23:15:30 | 真[/tr>|
3 | 2021-10-7 04:06:08 | false|
3 | 2021-10-07 07:47:43 | 真 |
3 | 2021-10-07 07:49:56 | 错误[/tr>|
3 | 2021-10-07 07:51:35 | false|
8 | 2021-10-06 15:36:46 | 错误 |
8 | 2021-10-06 15:37:12 | 错误 |
9 | 2021-10-07 07:13:27 | false|
9 | 2021-10-07 07:15:07 | 真 |
9 | 2021-10-07 07:17:33 | false|
10 | 2021-10-06 14:03:57 | 真[/tr>|
10 | 2021-10-06 14:10:45 | 错误[/tr>
在presto中,您可以使用lag
运算符来获取previor,然后使用它来与"当前";然后按id对所有内容进行分组,并使用max/min_by
:获得所需的事实
WITH dataset(id, Timestamp, Fact) AS (
values (1, timestamp '2021-10-25 11:21:12', false),
(2, timestamp '2021-10-14 18:49:25', false),
(2, timestamp '2021-11-03 12:48:47', true),
(2, timestamp '2021-11-08 23:15:12', false),
(2, timestamp '2021-11-08 23:15:30', true),
(3, timestamp '2021-10-07 04:06:08', false),
(3, timestamp '2021-10-07 07:47:43', true),
(3, timestamp '2021-10-07 07:49:56', false),
(3, timestamp '2021-10-07 07:51:35', false),
(8, timestamp '2021-10-06 15:36:46', false),
(8, timestamp '2021-10-06 15:37:12', false),
(9, timestamp '2021-10-07 07:13:27', false),
(9, timestamp '2021-10-07 07:15:07', true),
(9, timestamp '2021-10-07 07:17:33', false),
(10, timestamp '2021-10-06 14:03:57', true),
(10, timestamp '2021-10-06 14:10:45', false)
)
SELECT id,
min_by(fact, Timestamp) first_fact,
min(Timestamp) first_iter,
max_by(fact, Timestamp) last_fact,
max(Timestamp) last_iter,
sum(changed) chang_in_fact,
count(*) as iter
FROM(
SELECT id,
Timestamp,
Fact,
case
when prev_fact != fact then 1 else 0
end as changed
FROM(
SELECT *,
lag(fact) over (
partition by id
order by timestamp
) as prev_fact
FROM dataset
)
)
GROUP BY id
ORDER BY id
输出:
id | first_fact | first_iterlast_fact | last_iter | change_in_fact | iter |
---|---|---|---|---|---|
1 | false | 2021-10-25 11:21:12.000 | false2021-10-2 11:21:12.000|||
2 | false | 2021-10-14 18:49:25.000 | true3 | >4 | |
3 | false | 2021-10-07 04:06:08.000 | false | 2021-20-07 07:51:35.00002 | >4 |
false | |||||
9 | false | 2021-10-07 07:13:27.000 | false | <2021-10-07:17:33.0000>2>3 | |
10 | 真 | 2021-10-06 14:03:57.000 | 假<1>2 |