连续出席一系列活动



我正试图找到一个仅适用于SQL的解决方案,以解决与计算连续活动出席人数有关的问题。事件发生在不同的日子,所以我不能使用任何连续的日期方法来确定连续的出席人数。为了计算一个人的连续出席人数,我会从最近的活动开始,然后回到过去。我会统计这个人参加的每一个活动,当我遇到一个没有参加的活动时,我会停下来。这使我能够统计最近连续出席活动的人数。目前,所有数据都托管在SQL表中,下面是带有数据的示例模式:

USERS
ID  UserName    MinutesWatched
--- --------    --------------
1   jdoe        30
2   ssmith      400
3   bbaker      350
4   tduke       285
EVENTS
ID  Name        StartDate
--  ----------- ---------
1   1st Event   07/15/2018
2   2nd Event   07/16/2018
3   3rd Event   07/18/2018
4   4th Event   07/20/2018
ATTENDANCE
ID  User_ID Event_ID
--  ------- --------
1   1   1
2   1   2
3   1   3
4   1   4
5   2   4
6   2   3
7   3   4
8   3   2
9   3   1
10  4   4
11  4   3
12  4   2

对于我试图获得的输出:

OUTPUT
User_ID Consecutive   WatchedMinutes
------- -----------   --------------
1            4            30
2            2            400
3            1            350
4            3            285

我已经构建了C#代码来以迭代的方式实现这一点,但当我处理300000多名用户和数百个事件时,它会很慢。我很想看到这个SQL版本。

下面是根据Dan的要求计算顶级事件查看器的方法。输出实际上只是一个列出Top X事件查看器的字符串。

public string GetUsersTopWatchedConsecutiveStreams(int topUserCount)
{
string results = "Top " + topUserCount + " consecutive viewers - ";
Dictionary<ChatUser, int> userinfo = new Dictionary<ChatUser, int>();
using (StorageModelContext db = new StorageModelContext())
{
IQueryable<ChatUser> allUsers = null;
if (mainViewModel.CurrentStream != null)
allUsers = db.ViewerHistory.Include("Stream").Include("User").Where(x => x.Stream.Id == mainViewModel.CurrentStream.Id).Select(x => x.User);
else
allUsers = db.ViewerHistory.Include("Stream").Include("User").Where(x => x.Stream.Id == (db.StreamHistory.OrderByDescending(s => s.StreamEnd).FirstOrDefault().Id)).Select(x => x.User);

foreach (var u in allUsers)
{
int totalStreams = 0;
var user = db.Users.Include("History").Where(x => x.UserName == u.UserName).FirstOrDefault();
if (user != null)
{
var streams = user.History;
if (streams != null)
{
var allStreams = db.StreamHistory.OrderByDescending(x => x.StreamStart);
foreach (var s in allStreams)
{
var vs = streams.Where(x => x.Stream == s);
if (vs.Count() > 0)
totalStreams++;
else
break;
}
}
}
userinfo.Add(u, totalStreams);
totalStreams = 0;
}
var top = userinfo.OrderByDescending(x => x.Value).ThenByDescending(x => x.Key.MinutesWatched).Take(topUserCount);
int cnt = 1;
foreach (var t in top)
{
results += "#" + cnt + ": " + t.Key + "(" + t.Value.ToString() + "), ";
cnt++;
}
if (cnt > 1)
results = results.Substring(0, results.Length - 2);
}
return results;
}

当没有活动运行的事件时,mainViewModel.CurrentStream为null。当实时事件发生时,它将包含一个具有与实时流事件相关信息的对象。

也许你想试试这个:

事件按降序(按StartDate(得到一个行号,用户的出席次数按StartDate降序得到一个数字。现在,对于连续的出席人数,活动人数和出席人数的差异将是相同的。我使用这些差异进行分组,统计组中的出席人数,并返回差异最小的组(按用户(:

WITH
evt (ID, StartDate, evt_no) AS (
SELECT ID, StartDate,
ROW_NUMBER() OVER (ORDER BY StartDate DESC)
FROM EVENTS
),
att ([User_ID], grp_no) AS (
SELECT [User_ID], evt_no - 
ROW_NUMBER() OVER (PARTITION BY [User_ID] ORDER BY StartDate DESC)
FROM ATTENDANCE a
INNER JOIN evt ON a.Event_ID = evt.ID
),
con ([User_ID], Consecutive, rn) AS (
SELECT [User_ID], COUNT(*),
ROW_NUMBER() OVER (PARTITION BY User_ID ORDER BY grp_no)
FROM att
GROUP BY [User_ID], grp_no
)
SELECT u.ID AS [User_ID], u.UserName, u.MinutesWatched, con.Consecutive
FROM con
INNER JOIN USERS u ON con.[User_ID] = u.ID
WHERE con.rn = 1;

会对这个查询在您的系统上运行多长时间感兴趣。

您似乎想要一个人没有参加的最大活动id,该id小于该人参加的最大id。然后你要计算参加人数。

以下方法对此进行处理:

  • 将用户与所有事件组合到最大事件
  • 获取不匹配的最大事件
  • 带回计数为0的行并对其进行计数

因此,这将为事件提供计数:

select u.user_id,
sum(case when a.event_id is null then e.id end) over (partition by user_id) as max_nonmatch_event_id
from (select user_id, max(event_id) as max_event_id
from attendance 
group by user_id
) u join
events e
on e.id <= u.max_event_id left join
attendance a
on a.user_id = u.id and a.event_id = e.id
order by num_nulls_gt;

还有一个子查询可以完成剩下的任务:

select u.user_id, count(*) as num_consecutive
from (select u.user_id,
sum(case when a.event_id is null then e.id end) over (partition by user_id) as max_nonmatch_event_id
from (select user_id, max(event_id) as max_event_id
from attendance 
group by user_id
) u join
events e
on e.id <= u.max_event_id left join
attendance a
on a.user_id = u.id and a.event_id = e.id
) ue
where event_id > max_nonmatch_event_id
group by user_id;

最新更新