如何根据事件在SQL中计算客户保留率



我正在尝试创建一个SQL语句,以找出哪个客户没有连续参加三次事件

表1-客户:客户ID、客户名称

+-------------+---------------+
| Customer ID | Customer Name |
+-------------+---------------+
|          01 | Customer 01   |
|          02 | Customer 02   |
|          03 | Customer 03   |
+-------------+---------------+

表2-事件事件ID、事件日期、事件名称

+----------------------------------+
| Event ID  Event Date  Event Name |
+----------------------------------+
| 01        01/01/2020  Event 01   |
| 02        01/15/2020  Event 02   |
| 03        02/15/2020  Event 03   |
| 04        03/13/2020  Event 04   |
| 05        05/17/2020  Event 05   |
| 06        06/20/2020  Event 06   |
+----------------------------------+

表3-事件活动事件ID、客户ID

+----------+-------------+----+
| Event ID | Customer ID |    |
+----------+-------------+----+
|       01 |             | 01 |
|       01 |             | 02 |
|       01 |             | 03 |
|       02 |             | 01 |
|       03 |             | 01 |
|       03 |             | 02 |
|       04 |             | 01 |
|       05 |             | 01 |
|       06 |             | 01 |
|       06 |             | 03 |
+----------+-------------+----+

现在,我正在努力寻找那些没有连续参加3场活动的客户。

因此,在给定的示例中,将是客户2和客户3。

我采纳了史蒂夫的建议。这里是更新的SQL语句:

drop table if exists dbo.customer;
create table dbo.customer(
CustID        int not null,
CustName      varchar(20) not null);
insert dbo.customer(CustID, CustName) values
(1,'Cust 1'),
(2,'Cust 2'),
(3,'Cust 3'),
(4,'Cust 4'),
(5,'Cust 5')
;

drop table if exists dbo.events;
create table dbo.events(
EventID       int not null,
EventDate     date not null,
EventName     varchar(20) not null);
insert dbo.events(EventId, EventDate, EventName) values
(1,'2020-01-01','Event 1'),
(2,'2020-01-15','Event 2'),
(3,'2020-02-15','Event 3'),
(4,'2020-03-13','Event 4'),
(5,'2020-05-17','Event 5'),
(6,'2020-06-20','Event 6');

drop table if exists dbo.eventactivity;
create table dbo.eventactivity(
EventID       int not null,
CustID        int not null);
insert dbo.eventactivity(EventID, CustID) values
(1,1),
(1,2),
(1,3),
(1,4),
(1,5),
(2,1),
(2,2),
(2,4),
(2,5),
(3,1),
(3,5),
(4,1),
(4,5),
(5,1),
(5,2),
(5,3),
(5,5),
(6,1),
(6,2),
(6,3);
(6,5);

这里:

;with
events_sorted as (
select e.*, row_number() over (order by EventDate) seq from dbo.events e),
activity_lag as 
(
select
a.*, e.seq,
lag(e.seq, 1, 0) over (partition by CustId order by e.seq) lag_seq,
iif(lag(e.seq, 1, 0) over (partition by CustId order by e.seq)=0, 1, 
iif((e.seq-lag(e.seq, 1, 0) over (partition by CustId order by e.seq))>1, 1, 0)) seq_break
from dbo.eventactivity a
join events_sorted e on a.EventID=e.EventID
),
activity_lag_sum as (
select
alag.*, sum(seq_break) over (partition by CustId order by alag.seq) seq_grp
from
activity_lag alag
),
three_in_a_row_cte as (
select distinct CustId
from activity_lag_sum
group by CustID, seq_grp
having count(*)>=3
)
select * 
from customer c
where not exists(select 1
from three_in_a_row_cte r
where c.CustID=r.CustID);

问题是,这会返回客户2、客户3、客户4,而客户2确实参加了2个活动,跳过了2个,参加了2次,所以客户2不应该在列表中。

有什么建议吗?

以下查询返回CustId,这些CustId已:1(跳过3个或更多事件,或2(总共参加的事件少于3个。

;with
events_sorted as (
select e.*, row_number() over (order by EventDate) seq from #events e),
activity_lag as 
(
select
a.*, e.seq,
lag(e.seq, 1, 0) over (partition by CustId order by e.seq) lag_seq,
iif(lag(e.seq, 1, 0) over (partition by CustId order by e.seq)=0, 1, 
iif((e.seq-lag(e.seq, 1, 0) over (partition by CustId order by e.seq))>1, 1, 0)) seq_break
from #eventactivity a
join events_sorted e on a.EventID=e.EventID
)
select distinct CustId
from activity_lag
where seq-lag_seq>3
union all
select CustId
from activity_lag
group by CustId
having count(*)<3;

结果

CustId
3
4

您只需要连续跳过3个或更多事件的客户,您可以通过从事件活动表本身查询来获得这些客户。请在下面找到查询和查询结果:-

创建表格

create table event_activity ("event_id" varchar(2),"customer_id" varchar(2))
insert into event_activity
values ('01','01'),('01','02'),('01','03'),('02','01'),('02','03'),('03','01'),
('03','02'),('04','01'),('04','02'),('05','02'),('06','01'),('06','03'), 
('07','03'),('08','04'),('12','04'),('13','05')

以上查询将导致下表:-

event_id | customerid 
---------------------    
01   |   01
01   |   02
01   |   03
02   |   01
02   |   03
03   |   01
03   |   02
04   |   01
04   |   02
05   |   02
06   |   01
06   |   03
07   |   03
08   |   04
12   |   04
13   |   05

从上表中,我们可以观察到除了客户4和5之外的所有客户都连续跳过了少于3个事件。根据你的问题,我们只需要4和5,因为4已经连续跳过了3个活动,但5只参加了1个活动。

PS:-在这里你可以发现客户3也跳过了3个事件,但在此之前他已经跳过了参加了一些活动没有跳过任何一个,所以它必须被淘汰

最终查询

select c.customer_id
from
(
select customer_id, 
skipped_count, 
lag(skipped_count,1) over (partition by customer_id order by event_id) 
as ref
from 
( 
select customer_id, 
event_id,
LAG(event_id,1) over (partition by customer_id order by event_id) 
as previous_event,(event_id - LAG(event_id,1) over (partition by 
customer_id order by event_id)-1) as skipped_count
from 
(
select CONVERT(int,event_id) as event_id, 
CONVERT(int, customer_id) as customer_id 
from event_activity
)a
)b
)c
join
(
select convert(int,customer_id) as customer_id,
count(event_id) as count_event
from event_activity
group by customer_id
)d
on c.customer_id=d.customer_id
where (skipped_count >=3 and ref is null)
or count_event = 1
or (skipped_count >=3 and ref > 2)

输出

4
5

相关内容

  • 没有找到相关文章

最新更新