问题:
查询优化 - 当前解决方案需要 10+ 分钟才能完成,这对于报告目的来说太长了。
目标:
按天报告活跃客户。活跃客户是指在过去一年中进行过购买的人。
示例数据集:
Customer_Id |
Order_Date1 |2014
/1/11 |2014
/1/22 |2014
/1/13 |2014
/1/13 |2014
/1/3
所需的结果集:
Snapshot_Date |
Active_Customers2014/1/1 |
32014/1/2 |
32014/1/3 |
3...
2015/1/1 |
32015/1/2 |阿拉伯数字
2015/1/3 |
1
当前查询:
--Date Dimension table
CREATE TABLE #Snapshot ( Snapshot DATE )
INSERT INTO #Snapshot
( Snapshot )
VALUES ( '1/1/2014' ) ,
( '1/2/2014' ) ,
( '1/3/2014' ) ,
...
( '1/1/2015' ) ,
( '1/2/2015' ) ,
( '1/3/2015' )
--Orders Table
CREATE TABLE #Orders ( Customer_Id INT , Order_Date DATE )
INSERT INTO #Orders ( Customer_Id , Order_Date )
VALUES ( 1, '1/1/2014' ),
( 1, '1/2/2014' ),
( 2, '1/1/2014' ),
( 3, '1/1/2014' ),
( 3, '1/3/2014' )
--Query
SELECT a.Snapshot ,
SUM(CASE WHEN DATEDIFF(DAY, a.Last_Order, a.Snapshot) <= 365 THEN 1 ELSE 0 END) AS Active_Customers
FROM ( SELECT a.Snapshot ,
b.Customer_Id ,
MAX(b.Order_Date) AS Last_Order
FROM #Snapshot a
JOIN #Orders b ON b.Order_Date <= a.Snapshot
GROUP BY a.Snapshot ,
b.Customer_Id
) a
GROUP BY a.Snapshot
问题:
由于子选择中的联接,查询陷入困境。 如果 orders 表有 100 万行,而日期维度表有 1,000 行,则联接将创建 ~ 10 亿行。
有没有办法在消除子选择中的内部联接的同时创建相同的结果集以防止多对多关系?
我会推荐一种不同的方法,一种使用累积总和而不是每天单独计数的方法——这要快得多。
这个想法是每次出现新客户或一年未看到订单时添加一个新客户。 同样,在 365 天后减去客户 - 除非出现新订单。 其余的就是累计总和。 SQL Server 2012 支持所有这些功能。
下面将给出每天有订单的报告:
WITH o as (
SELECT o.*,
LAG(Order_Date) OVER (PARTITION BY Customer_Id ORDER BY Order_Date) as prev_OrderDate,
LEAD(Order_Date) OVER (PARTITION BY Customer_Id ORDER BY Order_Date) as next_OrderDate
FROM Orders o
)
SELECT thedate, SUM(inc), SUM(SUM(inc)) OVER (ORDER BY thedate)
FROM ((SELECT o.OrderDate as thedate, 1 as inc
FROM o
WHERE prev_OrderDate IS NULL OR DATEADD(year, 1, prev_OrderDate) < Order_Date
) UNION ALL
(SELECT o.OrderDate + 365, -1 as inc
FROM o
WHERE next_OrderDate IS NULL OR DATEADD(year, -1, next_OrderDate > Order_Date
)
) d
GROUP BY thedate