我试图计算Presto/Athena每行的累积百分比。例如:如果我有这样的数据
AccountID | UserID | HolidaysTaken
ABC | A | 4
ABC | B | 6
ABC | B | 3
ABC | K | 2
ABC | K | 3
ABC | X | 1
现在运行这个查询后,我得到以下结果:
SELECT AccountID, UserID, sum(HolidaysTaken) AS HolidaysTaken FROM table
WHERE AccountID = 'ABC'
GROUP BY AccountID, UserID
ORDER BY HolidaysTaken DESC
AccountID | UserID | HolidaysTaken
ABC | B | 9
ABC | K | 5
ABC | A | 4
ABC | X | 1
Total holiday taken by all users = 19
但是我想再添加2列。EachUserPercentage
:每个用户的假期占总假期的百分比。CumulativePercentage
:"EachUserPercentage"的累计和。这个我可以使用post
AccountID | UserID | HolidaysTaken | EachUserPercentage | CumulativePercentage
ABC | B | 9 | 47.36 | 47.36
ABC | K | 5 | 26.31 | 73.67
ABC | A | 4 | 21.05 | 94.72
ABC | X | 1 | 5.26 | 100
我尝试了不同的窗口函数percent_rank(), cume_dist() and ntile()
,但不能得到适当的EachUserPercentage
工作。
您可以使用窗口函数查找AccountID的百分比,然后使用另一个窗口函数将其与按每个UserID占用的总假期排序的未绑定行相加。如下所示:
WITH totalUser
AS (SELECT AccountID
,UserID
,SUM(HolidaysTaken) AS HolidaysTaken
,CAST(100.0 * SUM(HolidaysTaken) / SUM(SUM(HolidaysTaken)) OVER (PARTITION BY AccountID) AS NUMERIC(5, 2)) AS EachUserPercentage
FROM table
WHERE AccountID = 'ABC'
GROUP BY AccountID
,UserID)
SELECT totalUser.AccountID
,totalUser.UserID
,totalUser.HolidaysTaken
,totalUser.EachUserPercentage
,SUM(totalUser.EachUserPercentage) OVER (PARTITION BY totalUser.AccountID
ORDER BY totalUser.EachUserPercentage DESC
ROWS UNBOUNDED PRECEDING)
FROM totalUser
ORDER BY totalUser.HolidaysTaken DESC;
嗨,如果你的组是在AccountID(considered)上,你可以从下面的查询中简单地得到EachUserPercentage。
SELECT table.AccountID, UserID, sum(table.HolidaysTaken) AS HolidaysTaken,
MAX(CAST(all_sum.HolidaysTaken AS NUMERIC(12,2))),
(SUM(CAST(table.HolidaysTaken AS NUMERIC(12,2)))/MAX(CAST(all_sum.HolidaysTaken AS NUMERIC(12,2))))*100 EachUserPercentage
FROM table
LEFT OUTER JOIN (SELECT SUM(HolidaysTaken) AS HolidaysTaken,AccountID FROM table GROUP BY AccountID)all_sum ON all_sum.AccountID= table.AccountID
WHERE table.AccountID = 'ABC'
GROUP BY table.AccountID, UserID
ORDER BY HolidaysTaken DESC
它在我这边工作。