Presto SQL查找当前交易前一年的交易数量



我有一个客户和订单日期的(简化的(交易表。对于每一行/订单,我想找到当前订单前一年的订单数量。我可以通过自联接来实现这一点,但当我的事务表大得多时,它就会变得效率低下。我想我真的想在日期字段上使用一个介于之间的窗口函数,但这还没有在Presto中实现。有什么想法可以让我更有效地做到这一点吗?

with
transactions as (
select
1 as customer,
date '2020-01-01' as order_date
union all
select
1 as customer,
date '2020-01-26' as order_date
union all
select
1 as customer,
date '2020-02-01' as order_date
union all
select
1 as customer,
date '2020-02-02' as order_date
)
select
t1.*,
count(case when t2.order_date between date_add('day', -14, t1.order_date) and date_add('day', -1, t1.order_date) then t2.order_date else null end) as orders_14_days_before
from
transactions t1
left join
transactions t2 on t1.customer = t2.customer
group by
t1.customer,
t1.order_date

结果:

customer    order_date  orders_14_days_before
1   2020-01-01  0
1   2020-01-26  0
1   2020-02-01  1
1   2020-02-02  2

Presto似乎并不完全支持range窗口规范。所以你可以用另一种方式。由始至终:

with cd as (
select customer, order_date as dte, 1 as inc
from transactions
union all
select customer, order_date + interval '1' year, -1 inc
from transactions
)
select t.*, cd.one_year_count
from (select customer, dte,
sum(sum(inc)) over (partition by customer order by dte) as one_year_count
from cd
group by customer, date
) cd join
transactions t
on cd.dte = t.order_date;

你应该发现这要快得多。

多亏了Gordon Linoff上面的答案,我对它进行了调整,得到了正确的答案(至少在Athena中是这样(。你不需要sum(sum()) over ...,只要sum() over ...就足够了。

with
transactions as (
select
1 as customer,
date '2020-01-01' as order_date
union all
select
1 as customer,
date '2020-01-26' as order_date
union all
select
1 as customer,
date '2020-02-01' as order_date
union all
select
1 as customer,
date '2020-02-02' as order_date
),
cd as (
select
customer,
order_date as dte,
1 as inc
from
transactions
union all
select
customer,
order_date + interval '13' day,
-1 inc
from
transactions
),
cd2 as (
select
customer,
dte,
inc,
sum(inc) over (partition by customer order by dte rows between unbounded preceding and 1 preceding) as one_year_count
from
cd
)
select
t.*,
coalesce(cd2.one_year_count, 0) as one_year_count
from
cd2
inner join
transactions t
on cd2.dte = t.order_date
where
cd2.inc = 1
order by
2 asc

最新更新