根据与日期的距离在Snowflake中划分和选择值



我有一个10亿行的数据集,它不断地用更多关于客户的重复数据构建。

ID   creation_date          report_date             status
001  2021-01-20T00:22:06Z   2021-02-02T00:22:06Z    ACTIVE
002  2021-01-30T00:22:06Z   2021-02-02T00:22:06Z    ACTIVE
003  2021-02-01T00:22:06Z   2021-02-02T00:22:06Z    ACTIVE
001  2021-01-20T00:22:06Z   2021-02-02T00:23:06Z    ACTIVE
002  2021-01-30T00:22:06Z   2021-02-02T00:23:06Z    ACTIVE
003  2021-02-01T00:22:06Z   2021-02-02T00:23:06Z    ACTIVE
001  2021-01-20T00:22:06Z   2021-02-19T00:22:06Z    ACTIVE
002  2021-01-30T00:22:06Z   2021-02-19T00:22:06Z    ACTIVE
003  2021-02-01T00:22:06Z   2021-02-19T00:22:06Z    ACTIVE
001  2021-01-20T00:22:06Z   2021-02-20T00:22:06Z    ACTIVE
002  2021-01-30T00:22:06Z   2021-02-20T00:22:06Z    EXPIRED
003  2021-02-01T00:22:06Z   2021-02-20T00:22:06Z    EXPIRED
001  2021-01-20T00:22:06Z   2021-02-21T00:22:06Z    ACTIVE
002  2021-01-30T00:22:06Z   2021-02-21T00:22:06Z    EXPIRED
003  2021-02-01T00:22:06Z   2021-02-21T00:22:06Z    EXPIRED
001  2021-01-20T00:22:06Z   2021-02-30T00:22:06Z    ACTIVE
002  2021-01-30T00:22:06Z   2021-02-30T00:22:06Z    EXPIRED
003  2021-02-01T00:22:06Z   2021-02-30T00:22:06Z    EXPIRED
001  2021-01-20T00:22:06Z   2021-03-01T00:22:06Z    ACTIVE
002  2021-01-30T00:22:06Z   2021-03-01T00:22:06Z    EXPIRED
003  2021-02-01T00:22:06Z   2021-03-01T00:22:06Z    ACTIVE
001  2021-01-20T00:22:06Z   2021-03-22T00:22:06Z    EXPIRED
002  2021-01-30T00:22:06Z   2021-03-22T00:22:06Z    EXPIRED
003  2021-02-01T00:22:06Z   2021-03-22T00:22:06Z    EXPIRED

每个report_date表示将所有记录更新为当前状态的日期。就像脉搏检查一样。

我想要的只是用户在创建日期后一个月的一周内(第5周(的最后状态

例如:ID=001。

在这里,我们看到它们的创建日期是2021-01-20,意味着从这个日期起一个月是2021-02-20。我想知道:

  • 2021-02-202021-02-27之间的报告日期期间,此用户的最终状态是什么

您可以在上面的数据中看到,Active在2021-02-202021-02-27(列出的(之间的所有报告中都保持活动状态

简单地说,我们只想知道这个时间段内状态的最后一次变化。注意,在ID=003中,它们在2021-02-22', so though they wereEXPIRED`前一天切换到ACTIVE,在边界内切换回活动状态。

一个月后一周内的任何事情(5周后的任何事情(都无关紧要。

你也可能会注意到,从2021-01-30开始的一个月是2021-02-30,这没有意义。在这些情况下,请使用当月的最终日期或2021-02-28

最终输出:

ID    week_5_status
001          ACTIVE   
002         EXPIRED
003          ACTIVE

首先,将文本值(大概(转换为有效的日期时间值。然后,过滤行,使report_datetime在creation_datetime之后少于6周。取过滤列表的最大值,然后连接回原始数据,以获得具有最大值的行的状态。

CREATE TABLE t (id int, creation_date VARCHAR(19), report_date VARCHAR(19), status text);
INSERT INTO t (id,creation_date,report_date,status) VALUES 
(1,'2021-01-20T00:22:06','2021-02-02T00:22:06','ACTIVE'),
(2,'2021-01-30T00:22:06','2021-02-02T00:22:06','ACTIVE'),
(3,'2021-02-01T00:22:06','2021-02-02T00:22:06','ACTIVE'),
(1,'2021-01-20T00:22:06','2021-02-02T00:23:06','ACTIVE'),
(2,'2021-01-30T00:22:06','2021-02-02T00:23:06','ACTIVE'),
(3,'2021-02-01T00:22:06','2021-02-02T00:23:06','ACTIVE'),
(1,'2021-01-20T00:22:06','2021-02-19T00:22:06','ACTIVE'),
(2,'2021-01-30T00:22:06','2021-02-19T00:22:06','ACTIVE'),
(3,'2021-02-01T00:22:06','2021-02-19T00:22:06','ACTIVE'),
(1,'2021-01-20T00:22:06','2021-02-20T00:22:06','ACTIVE'),
(2,'2021-01-30T00:22:06','2021-02-20T00:22:06','EXPIRED'),
(3,'2021-02-01T00:22:06','2021-02-20T00:22:06','EXPIRED'),
(1,'2021-01-20T00:22:06','2021-02-21T00:22:06','ACTIVE'),
(2,'2021-01-30T00:22:06','2021-02-21T00:22:06','EXPIRED'),
(3,'2021-02-01T00:22:06','2021-02-21T00:22:06','EXPIRED'),
(1,'2021-01-20T00:22:06','2021-02-30T00:22:06','ACTIVE'),
(2,'2021-01-30T00:22:06','2021-02-30T00:22:06','EXPIRED'),
(3,'2021-02-01T00:22:06','2021-02-30T00:22:06','EXPIRED'),
(1,'2021-01-20T00:22:06','2021-03-01T00:22:06','ACTIVE'),
(2,'2021-01-30T00:22:06','2021-03-01T00:22:06','EXPIRED'),
(3,'2021-02-01T00:22:06','2021-03-01T00:22:06','ACTIVE'),
(1,'2021-01-20T00:22:06','2021-03-22T00:22:06','EXPIRED'),
(2,'2021-01-30T00:22:06','2021-03-22T00:22:06','EXPIRED'),
(3,'2021-02-01T00:22:06','2021-03-22T00:22:06','EXPIRED');

WITH dat
AS
(
SELECT id
, CAST(creation_date AS datetime) AS creation_datetime
, CAST(REPLACE(report_date,'02-30','02-28') AS datetime) AS report_datetime
, status
FROM t
),
dat2
AS
(
SELECT id
,MAX(report_datetime) AS max_report_datetime
FROM dat
WHERE DATEDIFF(week,creation_datetime,report_datetime) < 6
GROUP BY id
)
SELECT dat.*
FROM dat
INNER JOIN dat2 
ON dat.id = dat2.id 
AND dat.report_datetime = dat2.max_report_datetime;

dbfiddle.uk

QUALIFY和ROW_NUMBER似乎是您想要选择的"窗口内的最后状态";

因此,使用data的CTE,我将几个无效的report_date更改为真实日期。也许是错误的方向,但它不会影响SQL。

WITH data(id,creation_date,report_date,status)AS (
SELECT column1
,to_date(column2, 'YYYY-MM-DDThh:mi:ss')
,to_date(column3, 'YYYY-MM-DDThh:mi:ss')
,column4 
FROM VALUES 
(1,'2021-01-20T00:22:06','2021-02-02T00:22:06','ACTIVE'),
(2,'2021-01-30T00:22:06','2021-02-02T00:22:06','ACTIVE'),
(3,'2021-02-01T00:22:06','2021-02-02T00:22:06','ACTIVE'),
(1,'2021-01-20T00:22:06','2021-02-02T00:23:06','ACTIVE'),
(2,'2021-01-30T00:22:06','2021-02-02T00:23:06','ACTIVE'),
(3,'2021-02-01T00:22:06','2021-02-02T00:23:06','ACTIVE'),
(1,'2021-01-20T00:22:06','2021-02-19T00:22:06','ACTIVE'),
(2,'2021-01-30T00:22:06','2021-02-19T00:22:06','ACTIVE'),
(3,'2021-02-01T00:22:06','2021-02-19T00:22:06','ACTIVE'),
(1,'2021-01-20T00:22:06','2021-02-20T00:22:06','ACTIVE'),
(2,'2021-01-30T00:22:06','2021-02-20T00:22:06','EXPIRED'),
(3,'2021-02-01T00:22:06','2021-02-20T00:22:06','EXPIRED'),
(1,'2021-01-20T00:22:06','2021-02-21T00:22:06','ACTIVE'),
(2,'2021-01-30T00:22:06','2021-02-21T00:22:06','EXPIRED'),
(3,'2021-02-01T00:22:06','2021-02-21T00:22:06','EXPIRED'),
(1,'2021-01-20T00:22:06','2021-02-28T00:22:06','ACTIVE'),
(2,'2021-01-30T00:22:06','2021-02-28T00:22:06','EXPIRED'),
(3,'2021-02-01T00:22:06','2021-02-28T00:22:06','EXPIRED'),
(1,'2021-01-20T00:22:06','2021-03-01T00:22:06','ACTIVE'),
(2,'2021-01-30T00:22:06','2021-03-01T00:22:06','EXPIRED'),
(3,'2021-02-01T00:22:06','2021-03-01T00:22:06','ACTIVE'),
(1,'2021-01-20T00:22:06','2021-03-22T00:22:06','EXPIRED'),
(2,'2021-01-30T00:22:06','2021-03-22T00:22:06','EXPIRED'),
(3,'2021-02-01T00:22:06','2021-03-22T00:22:06','EXPIRED')
)

主SQL变为:

SELECT d.id
,d.creation_date
,d.report_date
,d.status
FROM data AS d
WHERE dateadd(week,5,d.creation_date) >= d.report_date
QUALIFY row_number() over (partition by id order by report_date desc) = 1 ;

给出:

状态
ID创建日期报告日期
12021-01-202021:02-21活动
22021-01-302021-03-01到期
32021-02-012022-03-01活动

最新更新