我对用户日志记录系统有复杂的计算要求。我需要根据用户在 180 天内的登录次数来定位最频繁的活跃用户。一旦两个登录日期相隔 181 天,它们不计入总数,但在与其他日期分组时可以计入总数。
例如,以下是 Jim 的登录历史记录:
Jim 2018-01-01
Jim 2018-04-01
Jim 2018-05-01
Jim 2018-06-01
Jim 2018-07-01
Jim 2018-08-01
Jim 2018-09-01
Jim 2018-12-01
为了简单起见,使用 6 个月而不是 180 天,并且只在一个方向上看 6 个月,Jim 得出了以下总数:
Logins: 5 (2018-01-01 + 6 months)
Logins: 6 (2018-04-01 + 6 months)
Logins: 5 (2018-05-01 + 6 months)
Logins: 5 (2018-06-01 + 6 months)
Logins: 4 (2018-07-01 + 6 months)
Logins: 3 (2018-08-01 + 6 months)
Logins: 2 (2018-09-01 + 6 months)
Logins: 1 (2018-12-01 + 6 months)
所以我的系统会报告6,因为它只想要最大总数。
除了蛮力计算,我对如何构建这个系统感到迷茫。是的,我可以在任何程度上对数据进行非规范化,速度是最重要的。
试试这个:
declare @tbl table(name char(3), dt date);
insert into @tbl values
('Jim', '2018-01-01'),
('Jim', '2018-04-01'),
('Jim', '2018-05-01'),
('Jim', '2018-06-01'),
('Jim', '2018-07-01'),
('Jim', '2018-08-01'),
('Jim', '2018-09-01'),
('Jim', '2018-12-01');
;with cte as (
select name, dt, DATEADD(day, 181, dt) upperDt from @tbl
), cte2 as (
select name,
(select COUNT(*) from cte where dt between c.dt and c.upperDt and name = c.name) cnt
from cte c
)
select name, MAX(cnt) [max]
from cte2
group by name
试试这个,使用公用表表达式计算结束日期窗口和交叉应用来计算登录总数
DECLARE @t TABLE (UserName NVARCHAR(10), LoginDate DATETIME)
INSERT INTO @t
(UserName,LoginDate) VALUES
('Jim','2018-01-01'),
('Jim','2018-04-01'),
('Jim','2018-05-01'),
('Jim','2018-06-01'),
('Jim','2018-07-01'),
('Jim','2018-08-01'),
('Jim','2018-09-01'),
('Jim','2018-12-01')
; WITH CteDateRange
AS(
SELECT
T.UserName
,T.LoginDate
--,EndDateRange = DATEADD(DAY, 181, LoginDate)
,EndDateRange = DATEADD(MONTH, 6, LoginDate)
FROM @t T
)
SELECT
DR.UserName
,DR.LoginDate
,DR.EndDateRange
,T.Total
FROM CteDateRange DR
CROSS APPLY ( SELECT Total = COUNT(D.LoginDate)
FROM CteDateRange D
WHERE D.LoginDate >= DR.LoginDate
AND D.LoginDate <= DR.EndDateRange
AND D.UserName = DR.UserName
) T
输出
UserName LoginDate EndDateRange Total
Jim 2018-01-01 00:00:00.000 2018-07-01 00:00:00.000 5
Jim 2018-04-01 00:00:00.000 2018-10-01 00:00:00.000 6
Jim 2018-05-01 00:00:00.000 2018-11-01 00:00:00.000 5
Jim 2018-06-01 00:00:00.000 2018-12-01 00:00:00.000 5
Jim 2018-07-01 00:00:00.000 2019-01-01 00:00:00.000 4
Jim 2018-08-01 00:00:00.000 2019-02-01 00:00:00.000 3
Jim 2018-09-01 00:00:00.000 2019-03-01 00:00:00.000 2
Jim 2018-12-01 00:00:00.000 2019-06-01 00:00:00.000 1
一个基本的解决方案使用join
:
select l.*
from (select l.name, count(*) as cnt,
row_number() over (partition by name order by count(*) desc) as seqnum
from logins l join
logins l2
on l.name = l2.name and
l2.date >= l.date and l2.date < dateadd(day, 181, l.date)
group by l.name
) l
where seqnum = 1;
这可能具有可接受的性能,索引在logins(name, date)
上。