我有一个关于客户、用户和收入的表格,类似于下面(实际上有数千条记录):
Customer User Revenue
001 James 500
002 James 750
003 James 450
004 Sarah 100
005 Sarah 500
006 Sarah 150
007 Sarah 600
008 James 150
009 James 100
我想做的是只返回支出最高的客户,这些客户占用户总收入的80%。
要手动做到这一点,我会根据James的客户的收入对他们进行排序,计算出总额的百分比和运行总额的百分比,然后只返回记录,直到运行总额达到80%:
Customer User Revenue % of total Running Total %
002 James 750 0.38 0.38
001 James 500 0.26 0.64
003 James 450 0.23 0.87 <- Greater than 80%, last record
008 James 150 0.08 0.95
009 James 100 0.05 1.00
我试过使用CTE,但到目前为止都是空白的。有没有任何方法可以通过单个查询而不是在Excel工作表中手动完成?
仅限SQL Server 2012+
您可以使用带窗口的SUM
:
WITH cte AS
(
SELECT *,
1.0 * Revenue/SUM(Revenue) OVER(PARTITION BY [User]) AS percentile,
1.0 * SUM(Revenue) OVER(PARTITION BY [User] ORDER BY [Revenue] DESC)
/SUM(Revenue) OVER(PARTITION BY [User]) AS running_percentile
FROM tab
)
SELECT *
FROM cte
WHERE running_percentile <= 0.8;
LiveDemo
SQL Server 2008:
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY [User] ORDER BY Revenue DESC) AS rn
FROM t
), cte2 AS
(
SELECT c.Customer, c.[User], c.[Revenue]
,percentile = 1.0 * Revenue / NULLIF(c3.s,0)
,running_percentile = 1.0 * c2.s / NULLIF(c3.s,0)
FROM cte c
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM cte c2
WHERE c.[User] = c2.[User]
AND c2.rn <= c.rn) c2
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM cte c2
WHERE c.[User] = c2.[User]) AS c3
)
SELECT *
FROM cte2
WHERE running_percentile <= 0.8;
LiveDemo2
输出:
╔══════════╦═══════╦═════════╦════════════════╦════════════════════╗
║ Customer ║ User ║ Revenue ║ percentile ║ running_percentile ║
╠══════════╬═══════╬═════════╬════════════════╬════════════════════╣
║ 2 ║ James ║ 750 ║ 0,384615384615 ║ 0,384615384615 ║
║ 1 ║ James ║ 500 ║ 0,256410256410 ║ 0,641025641025 ║
║ 7 ║ Sarah ║ 600 ║ 0,444444444444 ║ 0,444444444444 ║
╚══════════╩═══════╩═════════╩════════════════╩════════════════════╝
编辑2:
看起来差不多了,唯一的问题是它缺少最后一行,詹姆斯的第三排超过了0.80,但需要包括在内
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY [User] ORDER BY Revenue DESC) AS rn
FROM t
), cte2 AS
(
SELECT c.Customer, c.[User], c.[Revenue]
,percentile = 1.0 * Revenue / NULLIF(c3.s,0)
,running_percentile = 1.0 * c2.s / NULLIF(c3.s,0)
FROM cte c
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM cte c2
WHERE c.[User] = c2.[User]
AND c2.rn <= c.rn) c2
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM cte c2
WHERE c.[User] = c2.[User]) AS c3
)
SELECT a.*
FROM cte2 a
CROSS APPLY (SELECT MIN(running_percentile) AS rp
FROM cte2
WHERE running_percentile >= 0.8
AND cte2.[User] = a.[User]) AS s
WHERE a.running_percentile <= s.rp;
LiveDemo3
输出:
╔══════════╦═══════╦═════════╦════════════════╦════════════════════╗
║ Customer ║ User ║ Revenue ║ percentile ║ running_percentile ║
╠══════════╬═══════╬═════════╬════════════════╬════════════════════╣
║ 2 ║ James ║ 750 ║ 0,384615384615 ║ 0,384615384615 ║
║ 1 ║ James ║ 500 ║ 0,256410256410 ║ 0,641025641025 ║
║ 3 ║ James ║ 450 ║ 0,230769230769 ║ 0,871794871794 ║
║ 7 ║ Sarah ║ 600 ║ 0,444444444444 ║ 0,444444444444 ║
║ 5 ║ Sarah ║ 500 ║ 0,370370370370 ║ 0,814814814814 ║
╚══════════╩═══════╩═════════╩════════════════╩════════════════════╝
看起来很完美,翻译到我的大桌子上并返回我需要的东西,花了整整5分钟的时间来完成它,仍然无法遵循你所做的
SQL Server 2008
不支持OVER()
子句中的所有内容,但ROW_NUMBER
支持。
首先cte只计算组内的位置:
╔═══════════╦════════╦══════════╦════╗
║ Customer ║ User ║ Revenue ║ rn ║
╠═══════════╬════════╬══════════╬════╣
║ 2 ║ James ║ 750 ║ 1 ║
║ 1 ║ James ║ 500 ║ 2 ║
║ 3 ║ James ║ 450 ║ 3 ║
║ 8 ║ James ║ 150 ║ 4 ║
║ 9 ║ James ║ 100 ║ 5 ║
║ 7 ║ Sarah ║ 600 ║ 1 ║
║ 5 ║ Sarah ║ 500 ║ 2 ║
║ 6 ║ Sarah ║ 150 ║ 3 ║
║ 4 ║ Sarah ║ 100 ║ 4 ║
╚═══════════╩════════╩══════════╩════╝
第二次cte:
c2
子查询根据ROW_NUMBER
的排名计算运行总数c3
计算每个用户的全额
在最终查询中,s
子查询发现最低的running
总数超过80%。
编辑3:
使用ROW_NUMBER
实际上是多余的。
WITH cte AS
(
SELECT c.Customer, c.[User], c.[Revenue]
,percentile = 1.0 * Revenue / NULLIF(c3.s,0)
,running_percentile = 1.0 * c2.s / NULLIF(c3.s,0)
FROM t c
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM t c2
WHERE c.[User] = c2.[User]
AND c2.Revenue >= c.Revenue) c2
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM t c2
WHERE c.[User] = c2.[User]) AS c3
)
SELECT a.*
FROM cte a
CROSS APPLY (SELECT MIN(running_percentile) AS rp
FROM cte c2
WHERE running_percentile >= 0.8
AND c2.[User] = a.[User]) AS s
WHERE a.running_percentile <= s.rp
ORDER BY [User], Revenue DESC;
LiveDemo4
在SQL Server 2012+中,您可以使用累积和,这样效率会高得多。在SQLServer2008中,可以使用相关的子查询或cross apply
:来执行此操作
select t.*,
sum(t.Revenue*1.0) / sum(t.Revenue) over (partition by user) as [% of Total],
sum(RunningRevenue*1.0) / sum(t.Revenue) over (partition by user) as [Running Total %]
from t cross apply
(select sum(Revenue) as RunningRevenue
from t t2
where t2.Revenue >= t.Revenue and t2.user = t.user
) t2;
注意:*1.0
只是为了防止Revenue
被存储为整数。SQL Server执行整数除法,这将为几乎所有行的两列返回0
。
编辑:
如果只想得到James的结果,请添加where user = 'James'
。