此查询每月失败一次,如何进行重构



此查询每月一次失败,因为两者之间的部分无效。在value BETWEEN min AND max的情况下,3月1日,我的查询将再次失败,因为它将计算为partition_2 BETWEEN 28 AND 1。如何使此查询更加可靠,但仍然只使用所需的分区?

WITH recent_tasks AS
(SELECT task_id, state, timestamp, partition_0, partition_1, partition_2,
  row_number() OVER (PARTITION BY task_id
               ORDER BY timestamp DESC) AS rn
FROM firehose
WHERE
 "partition_0" BETWEEN to_char(current_date - interval '1' day, 'yyyy') AND to_char(current_date, 'yyyy')
 and "partition_1" BETWEEN to_char(current_date - interval '1' day, 'mm') AND to_char(current_date, 'mm')
 and "partition_2" BETWEEN to_char(current_date - interval '1' day, 'dd') AND to_char(current_date, 'dd')
ORDER BY rn)
SELECT * FROM recent_tasks
WHERE rn=1

几个注释:

  • 分区是char值而不是整数
  • 分区_2是一个月分区
  • 查询的目的是找到每个task_id的最新状态
  • 使用AWS Athena
  • 数据以s3/yyyy/mm/dd格式存储,每天都是新分区

理想情况下,我的查询将正确处理每月过渡:

BETWEEN FEB 10 AND FEB 11 (works with above)
BETWEEN FEB 28 AND MAR 1  (fails with above)
BETWEEN MAR 1 AND MAR 2   (works with above)

如果您想获得零而不是28:

cast(to_char(current_date, 'dd') as signed)-1

so,在03/01上,这将返回1 -to_char(current_date,'dd'(,然后减去它将给您零:

and "partition_2" BETWEEN to_char(cast(to_char(current_date, 'dd') as signed)-1) AND to_char(current_date, 'dd')

相关内容

  • 没有找到相关文章

最新更新