SQL查询可查找2020年1月连续3天购物的客户数量



我在下表中有一个名为订单的订单,它有客户id和订单日期(注意:一天内可能有来自同一客户的多个订单(

create table orders (Id char, order_dt date)
insert into orders values
('A','1/1/2020'),
('B','1/1/2020'),
('C','1/1/2020'),
('D','1/1/2020'),
('A','1/1/2020'),
('B','1/1/2020'),
('A','2/1/2020'),
('B','2/1/2020'),
('C','2/1/2020'),
('B','2/1/2020'),
('A','3/1/2020'),
('B','3/1/2020')

我正试图编写一个SQL查询,以查找2020年1月连续3天购物的客户数量

基于上述订单值,输出应为:2

我提到了其他类似的问题,但仍然无法找到准确的解决方案

这是我的解决方案,即使一天内有很多客户的订单,它也能很好地工作;

构建测试环境的一些脚本:

create table orders (Id varchar2(1), order_dt date);
insert into orders values('A',to_date('01/01/2020','dd/mm/yyyy'));
insert into orders values('B',to_date('01/01/2020','dd/mm/yyyy'));
insert into orders values('C',to_date('01/01/2020','dd/mm/yyyy'));
insert into orders values('D',to_date('01/01/2020','dd/mm/yyyy'));
insert into orders values('A',to_date('01/01/2020','dd/mm/yyyy'));
insert into orders values('B',to_date('01/01/2020','dd/mm/yyyy'));
insert into orders values('A',to_date('02/01/2020','dd/mm/yyyy'));
insert into orders values('B',to_date('02/01/2020','dd/mm/yyyy'));
insert into orders values('C',to_date('02/01/2020','dd/mm/yyyy'));
insert into orders values('B',to_date('02/01/2020','dd/mm/yyyy'));
insert into orders values('A',to_date('03/01/2020','dd/mm/yyyy'));
insert into orders values('B',to_date('03/01/2020','dd/mm/yyyy'));

select distinct id,  count_days from (
select id,
order_dt,
count(*) over(partition by id order by order_dt range between 1  preceding  and 1 following  )  count_days
from orders group by  id, order_dt
)
where count_days = 3;
--  Insert for test more days than 3 consecutive
insert into orders values('A',to_date('04/01/2020','dd/mm/yyyy'));

您可以使用两个窗口函数来计算连续日期和具有ROWS偏移量的滑动窗口之间的差异,以计算不同的连续天数。此处示例:

with gen as (
select 1 as cust_id, (date '2020-01-10') + 1 as q from dual union all
select 1, (date '2020-01-10') + 2 as q from dual union all
select 1, (date '2020-01-10') + 3 as q from dual union all
select 1, (date '2020-01-10') + 3 as q from dual union all
select 1, (date '2020-01-10') + 5 as q from dual union all
select 1, (date '2020-01-10') + 7 as q from dual union all
select 1, (date '2020-01-10') + 8 as q from dual union all
select 1, (date '2020-01-10') + 9 as q from dual
)
, diff as (
select gen.*
, q - lag(q) over(partition by cust_id, trunc(q, 'mm') order by q asc) as datediff
from gen
)
, window as (
select diff.*
, sum(decode(datediff, 1, 1, 0)) over(partition by cust_id, trunc(q, 'mm') order by q asc range between 2 preceding and current row) as cnt
from diff
)
select sum(count(distinct q)) as cnt
from window
where cnt = 2
group by cust_id

为什么不在接下来的两天里加入两次。只要您有客户ID和日期的索引,就应该优化联接。因为联接需要在相同的开始日期基础上进行匹配,所以它要么找到,要么不找到。如果没有,它将被排除在结果集中。

select distinct 
o1.id
from
orders o1
JOIN orders o2
on o1.id = o2.id
AND o1.order_dt = o2.order_dt - interval '1' day
JOIN orders o3
on o1.id = o3.id
AND o1.order_dt = o3.order_dt - interval '2' day

嗯。一种方法是使用CCD_ 2/CCD_。假设你在一天内没有重复,那么:

select distinct id
from (select o.*,
lag(order_dt) over (partition by id order by order_dt) as prev_order_dt,
lag(order_dt, 2) over (partition by id order by order_dt) as prev_order_dt2
from orders o
where order_dt >= date '2020-01-01' and 
order_dt < date '2020-02-01'
) o
where prev_order_dt = order_dt - interval '1' day and
prev_order_dt2 = order_dt - interval '2' day;

编辑:

如果表中有重复的记录,上面的内容很容易调整:

select distinct id
from (select o.*,
lag(order_dt) over (partition by id order by order_dt) as prev_order_dt,
lag(order_dt, 2) over (partition by id order by order_dt) as prev_order_dt2
from (select distinct o.id, trunc(order_dt) as order_dt
from orders o
where order_dt >= date '2020-01-01' and 
order_dt < date '2020-02-01'
) o
) o
where prev_order_dt = order_dt - interval '1' day and
prev_order_dt2 = order_dt - interval '2' day;

最新更新