优步 CRM 案例研究 SQL 中的客户保留指标



背景:我想在 每周滚动一次,以便在骑手 28 天未骑行时可以采取必要的干预。

友情链接 : 问题详情

以下是我尝试在单个查询中实现的指标列表

输出查询(单个查询(中列的定义:

日期
  1. :计算以下指标的日期。

  2. city_id:城市的 ID

  3. dau:在该日期完成至少一次行程的不同乘客的数量。

  4. wau:在过去 7 天内完成至少一次行程的不同乘客的数量(与"日期"列中的日期相关

    (
  5. new_rider:在过去 7 天内就"日期"列中的日期进行过first trip的不同乘客的数量

  6. previous_mau:在过去 56 天到过去 29 天内(相对于"日期"列中的日期(至少进行过一次行程的不同乘客数量

  7. mau_28:在过去 28 天内完成至少一次行程(相对于"日期"列中的日期(的不同乘客数量

  8. 保留:previous_mau和mau_28阶段不同骑手的交集

  9. 复活:在previous_mau阶段不活跃但在mau_28阶段活跃的不同骑手的数量。

  10. 流失:在previous_mau阶段处于活跃状态但在 mau_28阶段。

有效 : 如果乘客在相应时间段内至少完成了一次行程 非活跃 : 如果乘客在相应时间段内未进行过一次行程

以下是我尝试过的:

create table Tripdata
(
[date] date,
rider_id int,
trip_id int,
city_id int,
status varchar(100)
)
go

用于插入值的查询

INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 348, 1, 8, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1729, 2, 5, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5265, 3, 4, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2098, 4, 4, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4942, 5, 8, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5424, 6, 11, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4269, 7, 7, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5649, 8, 1, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2385, 9, 6, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5161, 10, 8, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 571, 11, 8, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5072, 12, 9, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1233, 13, 5, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2490, 14, 5, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5665, 15, 9, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1400, 16, 2, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 3324, 17, 4, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2533, 18, 13, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5314, 19, 11, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4773, 20, 12, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5544, 21, 2, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1232, 22, 5, N'completed')
GO

以下是我到目前为止得到/获得的:(归功于LukStorms(

SELECT *
FROM 
(
SELECT  [datee], city_id, 
COUNT(DISTINCT rider_id) AS [dau]
FROM [dbo].[Tripdata]
GROUP BY [datee], city_id
) t
OUTER APPLY
(
SELECT
COUNT(rider_id) AS [wau],
COUNT(CASE WHEN [rides]=1 THEN rider_id END) AS [new_rider]
FROM
(
SELECT t2.city_id, t2.rider_id,
COUNT(*) AS [rides]
FROM [dbo].[Tripdata] t2
WHERE t2.city_id = t.city_id
AND t2.[datee] <= t.[datee]
AND t2.[datee]>=dateadd(day,-7,t.[datee])
GROUP BY t2.city_id, t2.rider_id
) q
GROUP BY city_id
) last7
OUTER APPLY
(
SELECT 
COUNT(DISTINCT t2.rider_id) AS [previous_mau]
FROM [dbo].[Tripdata] t2
WHERE t2.city_id = t.city_id
AND t2.[datee] <= dateadd(day,-29,t.[datee])
AND t2.[datee] >= dateadd(day,-56,t.[datee])
) prev29
ORDER BY t.[datee], t.city_id;

如何在单个查询中实现上述查询的所有结果?以及如何编写查询以在单个查询输出中回答上述 7、8、9、10 个问题?

此外,它们是将骑手映射到特定城市的一些特殊注意事项

乘客可能会从多个城市出发,这可能会导致系统统计活跃的乘客 或在多个城市不活跃。因此,要解决此问题,需要将骑手映射到 只有一个城市。骑手应映射到他们从中获取的城市 仅考虑最近 20 次旅行的最大行程。 2. 对于与城市相关的所有计算,重要的是要考虑映射到骑手的城市 而不是旅行发生的城市。 3.我们的数据库系统没有标准模式功能,因此骑手城市地图 需要派生。

这是我的做法 ->

SQL 小提琴

MS SQL Server 2017 架构设置

create table TripData
(
[date] date,
rider_id int,
trip_id int,
city_id int,
status varchar(100)
)
go
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 348, 1, 8, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1729, 2, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5265, 3, 4, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2098, 4, 4, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4942, 5, 8, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5424, 6, 11, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4269, 7, 7, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5649, 8, 1, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2385, 9, 6, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5161, 10, 8, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 571, 11, 8, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5072, 12, 9, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1233, 13, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2490, 14, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5665, 15, 9, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1400, 16, 2, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 3324, 17, 4, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2533, 18, 13, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5314, 19, 11, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4773, 20, 12, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5544, 21, 2, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1232, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-28T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-03-28T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-01-28T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-15T00:00:00.000' AS DateTime), 222, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 222, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 333, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 333, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-15T00:00:00.000' AS DateTime), 222, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 222, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 333, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 333, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 3333, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 3333, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-28T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-03-28T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-01-28T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')

查询 1

;WITH AdddedIndicators AS
(
/*For every record, calculate the key metrics needed to aggerate up.
days_back_last_ride can make use of LAG() by rider and city ordered by date or null if no value,
ISNULL the result to bring it back to 0 meaning no days back(first ride).
Days_back_last_ride=0 could be used to determine first ride date, however, that would ot fit the between 1..7 rule
so we need a first_ride_date. Again, using a window function by rider and city, grab the min date*/
SELECT 
td.date, rider_id, city_id,
days_back_last_ride = ISNULL(DATEDIFF(DAY,LAG(date) OVER(PARTITION BY rider_id,city_id ORDER BY date),td.date),0),
first_ride_date =  MIN(date) OVER (PARTITION BY rider_id, city_id)
FROM    
TripData td
)
,Normalized AS
(
/*The need metrics have been calculated above  for the bulk of your calcs with the data, query it and 
build up, aggregates up, flags for each rider/city/date so we can ultimatley formulate this for each rider/city
since this is by rider, city and date the user will allocate points to every city
vistited in a given day*/
SELECT 
date, city_id, rider_id,
dau= COUNT(DISTINCT rider_id),
wau_flag = SUM(CASE WHEN  days_back_last_ride BETWEEN 1 AND 7 THEN 1 ELSE 0 END),
new_rider_flag = SUM(CASE WHEN DATEDIFF(DAY,first_ride_date,date) <= 7 THEN 1 ELSE 0 END),
previous_mau_flag = SUM(CASE WHEN  days_back_last_ride BETWEEN 29 AND 56 THEN 1 ELSE 0 END),
mau_28_flag = SUM(CASE WHEN days_back_last_ride BETWEEN 1 AND 28 THEN 1 ELSE 0 END),
retained = CASE WHEN SUM(CASE WHEN days_back_last_ride BETWEEN 1 AND 28 THEN 1 ELSE 0 END) > 1 
AND   
SUM(CASE WHEN  days_back_last_ride BETWEEN 29 AND 56 THEN 1 ELSE 0 END) > 1 THEN 1 ELSE 0 END
FROM 
AdddedIndicators
GROUP BY
city_id, date, rider_id
)
SELECT 
/* Finalize the results by date and city
The flags have been made by user, city and date above. 
So gather each data piont and sum them up based on the rule set */
date, city_id,
dau = SUM(dau),
wau_flag = SUM(CASE WHEN wau_flag  >= 1 THEN 1 ELSE 0 END),
retained = SUM(CASE WHEN previous_mau_flag >= 1 AND mau_28_flag >= 1 THEN 1 ELSE 0 END),
resurrect = SUM(CASE WHEN previous_mau_flag = 0 AND mau_28_flag >= 1 THEN 1 ELSE 0 END),
churn  = SUM(CASE WHEN previous_mau_flag >= 1 AND mau_28_flag = 0 THEN 1 ELSE 0 END)
FROM 
Normalized
GROUP BY
date, city_id   

结果

|       date | city_id | dau | wau_flag | retained | resurrect | churn |
|------------|---------|-----|----------|----------|-----------|-------|
| 2019-06-01 |       1 |   1 |        0 |        0 |         0 |     0 |
| 2019-06-01 |       2 |   2 |        0 |        0 |         0 |     0 |
| 2019-06-01 |       4 |   3 |        0 |        0 |         0 |     0 |
| 2019-01-28 |       5 |   2 |        0 |        0 |         0 |     0 |
| 2019-03-28 |       5 |   2 |        0 |        0 |         0 |     0 |
| 2019-05-01 |       5 |   3 |        0 |        0 |         0 |     2 |
| 2019-05-15 |       5 |   1 |        0 |        0 |         0 |     0 |
| 2019-05-28 |       5 |   2 |        0 |        0 |         2 |     0 |
| 2019-06-01 |       5 |   8 |        2 |        0 |         3 |     1 |
| 2019-06-01 |       6 |   1 |        0 |        0 |         0 |     0 |
| 2019-06-01 |       7 |   1 |        0 |        0 |         0 |     0 |
| 2019-06-01 |       8 |   4 |        0 |        0 |         0 |     0 |
| 2019-06-01 |       9 |   2 |        0 |        0 |         0 |     0 |
| 2019-05-01 |      10 |   2 |        0 |        0 |         0 |     0 |
| 2019-05-15 |      10 |   1 |        0 |        0 |         0 |     0 |
| 2019-06-01 |      10 |   3 |        0 |        0 |         1 |     2 |
| 2019-06-01 |      11 |   2 |        0 |        0 |         0 |     0 |
| 2019-06-01 |      12 |   1 |        0 |        0 |         0 |     0 |
| 2019-06-01 |      13 |   1 |        0 |        0 |         0 |     0 |

试试这个:

select [date], city_id
,(select count(distinct [rider_id])  from #Tripdata b where b.[date] = a.[date] and a.city_id = b.city_id) as [dau]
,(select count(distinct [rider_id])  from #Tripdata b where b.[date] between dateadd(day, -7, a.[date]) and a.[date] and a.city_id = b.city_id) as [wau]
,(select count(distinct [rider_id])  from #Tripdata b where a.city_id = b.city_id 
and b.[rider_id] NOT IN  (Select c.[rider_id] from #Tripdata c where c.[date] < dateadd(day, -7, a.[date])) 
and b.[rider_id] NOT IN  (Select c.[rider_id] from #Tripdata c where c.[date] > a.[date]) 
) as [new_rider]
,(select count(distinct [rider_id])  from #Tripdata b where a.city_id = b.city_id 
and b.[rider_id] NOT IN  (Select c.[rider_id] from #Tripdata c where c.[date] < dateadd(day, -56, a.[date])) 
and b.[rider_id] NOT IN  (Select c.[rider_id] from #Tripdata c where c.[date] > dateadd(day, -29, a.[date])) 
) as [previous_mau]
from #Tripdata a
group by  [date], city_id

相关内容

最新更新