SQL AWS Athena Group by Without a Column



我有这个数据集

patient_id   doctor_id   status   created_at
1            1           A        2020-10-01 10:00:00
1            1           P        2020-10-01 10:30:00
1            1           U        2020-10-01 10:35:00
1            2           A        2020-10-01 10:40:00
...

我想按patient_id和doctor_id对其进行分组,但如果没有状态,则对其进行了分组,因此结果将类似于以下

patient_id   doctor_id   status   created_at
1            1           U        2020-10-01 10:35:00
1            2           A        2020-10-01 10:40:00
...

AWS Athena必须对所有列进行分组,但我需要最后一个状态

在Athena/Presto中,您可以使用max_by函数:

SELECT
patient_id,
doctor_id,
MAX_BY(status, created_at) AS last_status
FROM the_table
GROUP BY 1, 2

max_by(x, y)函数返回具有组的列y的最大值的行的列x的值。

ROW_NUMBER在此提供一个选项:

WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY patient_id, doctor_id ORDER BY created_at DESC) rn
FROM yourTable
)
SELECT patient_id, doctor_id, status, created_at
FROM cte
WHERE rn = 1
ORDER BY patient_id, doctor_id;

最新更新