假设一个表,例如:
UID Name Datetime Users
4 Room 4 2012-08-03 14:00:00 3
2 Room 2 2012-08-03 14:00:00 3
3 Room 3 2012-08-03 14:00:00 1
1 Room 1 2012-08-03 14:00:00 2
3 Room 3 2012-08-03 14:15:00 1
2 Room 2 2012-08-03 14:15:00 4
1 Room 1 2012-08-03 14:15:00 3
1 Room 1 2012-08-03 14:30:00 6
1 Room 1 2012-08-03 14:45:00 3
2 Room 2 2012-08-03 14:45:00 7
3 Room 3 2012-08-03 14:45:00 8
4 Room 4 2012-08-03 14:45:00 4
我想获取从下午 2 点到下午 3 点的每个房间的平均用户数 (1,2,3,4(。问题在于,有时房间可能不会在 15 分钟的间隔时间内"入住",因此必须假设之前最后一个已知用户计数仍然有效。
例如,4号房间2012-08-03 14:15:00
办理入住手续从未入住,因此必须假设4号房间有3个用户2012-08-03 14:15:00
因为这就是它在2012-08-03 14:00:00
这一直持续到位,因此我正在寻找的平均用户数如下:
房间 1: (2 + 3 + 6 + 3(/4 = 3.5
房间 2: (3 + 4 + 4
+ 7(/4 = 4.5
房间 3: (1 + 1 + 1
+ 8(/4 = 2.75
房间 4: (3 + 3
+ 3
+ 4(/4 = 3.25
其中#
是基于先前已知签到的假定数字。
我想知道是否可以单独使用 SQL 做到这一点? 如果没有,我对一个巧妙的PHP解决方案感到好奇,它不仅仅是蛮力数学,例如我的快速不准确的伪代码:
foreach ($rooms_id_array as $room_id) {
$SQL = "SELECT * FROM `table` WHERE (`UID` == $room_id && `Datetime` >= 2012-08-03 14:00:00 && `Datetime` <= 2012-08-03 15:00:00)";
$result = query($SQL);
if ( count($result) < 4 ) {
// go through each date and find what is missing, and then go to previous date and use that instead
} else {
foreach ($result)
$sum += $result;
$avg = $sum / 4;
}
}
您的困难(最昂贵的步骤(将是填写空白。 如果无法在源数据中"填空",则可能需要一个要联接的模板,然后使用相关子查询查找与该模板关联的数据。
这通常最好用于实际表,但这里有一个使用硬编码内联视图的示例......
SELECT
`room`.`uid` `uid` ,
AVG(`data`.`users`) `average_users`
FROM
(SELECT 1 `UID` UNION ALL
SELECT 2 `UID` UNION ALL
SELECT 3 `UID` UNION ALL
SELECT 4 `UID`) `room`
CROSS JOIN
(SELECT '2012-08-03 14:00:00' `datetime` UNION ALL
SELECT '2012-08-03 14:15:00' `datetime` UNION ALL
SELECT '2012-08-03 14:30:00' `datetime` UNION ALL
SELECT '2012-08-03 14:45:00' `datetime`) `checkin`
LEFT JOIN
data
ON `data`.`uid` = `room`.`uid`
AND `data`.`datetime` = (SELECT MAX(`datetime`)
FROM `data`
WHERE `uid` = `room`.`uid`
AND `datetime` <= `checkin`.`datetime`)
GROUP BY
`room`.`uid`
- CROSS JOIN
创建模板,以确保您始终拥有每个房间的每个登记亭的记录。
- correlated sub-query
会搜索时间,以查找当时该房间的最新入住时间。
您可以使用此解决方案:
SELECT b.Name,
AVG(b.Users) avg_users
FROM (
SELECT a.UID,
MAX(c.Datetime) last_date
FROM (SELECT DISTINCT UID FROM tbl) a
CROSS JOIN (
SELECT '14:00:00' intrvl UNION ALL
SELECT '14:15:00' UNION ALL
SELECT '14:30:00' UNION ALL
SELECT '14:45:00'
) b
JOIN tbl c ON a.UID = c.UID
AND TIME(b.intrvl) >= TIME(c.Datetime)
GROUP BY a.UID,
b.intrvl
) a
JOIN tbl b ON a.UID = b.UID
AND a.last_date = b.Datetime
GROUP BY b.UID,
b.Name
查询细分:
第 1 步:
我们需要做的第一件事是将每个房间与每个时间间隔相关联。例如,在您的示例数据中,Room 4
与区间 14:15:00
和 14:30:00
没有关联,但我们仍然需要以某种方式表示这些关联。
我们通过为每个具有相关时间间隔的不同房间创建一个笛卡尔乘积来实现这一点:
SELECT a.UID,
b.intrvl
FROM (SELECT DISTINCT UID FROM tbl) a
CROSS JOIN (
SELECT '14:00:00' intrvl UNION ALL
SELECT '14:15:00' UNION ALL
SELECT '14:30:00' UNION ALL
SELECT '14:45:00'
) b
ORDER BY b.intrvl, a.UID DESC --Ordering for display purposes
呈现:
UID | intrvl
--------------
4 | 14:00:00
3 | 14:00:00
2 | 14:00:00
1 | 14:00:00
4 | 14:15:00
3 | 14:15:00
2 | 14:15:00
1 | 14:15:00
4 | 14:30:00
3 | 14:30:00
2 | 14:30:00
1 | 14:30:00
4 | 14:45:00
3 | 14:45:00
2 | 14:45:00
1 | 14:45:00
SQLFiddle 演示
第 2 步:
然后,一旦我们有了这些关联,我们就把结果连接回主表(tbl
(,条件是主表Datetime
字段的时间部分小于每个UID
的笛卡尔连接时间。这将对每个UID
-> intrvl
关联进行操作,它将显示在intrvl
时间或之前发生的所有条目。
例如,由于Room 3
没有14:30:00
intrvl 的条目,因此只有两个条目会与该 intrvl 连接:14:15:00
和 14:00:00
上的条目,因为它们都发生在 intrvl 时间或之前。
你现在可以看到我们要去哪里了。此步骤的结果将使我们能够访问每个 intrvl 的最新条目。
SELECT a.UID,
b.intrvl,
c.*
FROM (SELECT DISTINCT UID FROM tbl) a
CROSS JOIN (
SELECT '14:00:00' intrvl UNION ALL
SELECT '14:15:00' UNION ALL
SELECT '14:30:00' UNION ALL
SELECT '14:45:00'
) b
JOIN tbl c ON a.UID = c.UID
AND TIME(b.intrvl) >= TIME(c.Datetime)
ORDER BY b.intrvl, a.UID DESC, c.Datetime --Ordering for display purposes
渲染(不包括Name
列(:
UID | intrvl | Datetime | Users
---------------- --------------------------------
4 | 14:00:00 | 2012-08-03 14:00:00 | 3 <-- Most recent entry up until 14:00:00
3 | 14:00:00 | 2012-08-03 14:00:00 | 1 <-- Most recent entry up until 14:00:00
2 | 14:00:00 | 2012-08-03 14:00:00 | 3 <-- Most recent entry up until 14:00:00
1 | 14:00:00 | 2012-08-03 14:00:00 | 2 <-- Most recent entry up until 14:00:00
4 | 14:15:00 | 2012-08-03 14:00:00 | 3 <-- Most recent entry up until 14:15:00
3 | 14:15:00 | 2012-08-03 14:00:00 | 1
3 | 14:15:00 | 2012-08-03 14:15:00 | 1 <-- Most recent entry up until 14:15:00
2 | 14:15:00 | 2012-08-03 14:00:00 | 3
2 | 14:15:00 | 2012-08-03 14:15:00 | 4 <-- Most recent entry up until 14:15:00
1 | 14:15:00 | 2012-08-03 14:00:00 | 2
1 | 14:15:00 | 2012-08-03 14:15:00 | 3 <-- Most recent entry up until 14:15:00
4 | 14:30:00 | 2012-08-03 14:00:00 | 3 <-- Most recent entry up until 14:30:00
3 | 14:30:00 | 2012-08-03 14:00:00 | 1
3 | 14:30:00 | 2012-08-03 14:15:00 | 1 <-- Most recent entry up until 14:30:00
2 | 14:30:00 | 2012-08-03 14:00:00 | 3
2 | 14:30:00 | 2012-08-03 14:15:00 | 4 <-- Most recent entry up until 14:30:00
1 | 14:30:00 | 2012-08-03 14:00:00 | 2
1 | 14:30:00 | 2012-08-03 14:15:00 | 3
1 | 14:30:00 | 2012-08-03 14:30:00 | 6 <-- Most recent entry up until 14:30:00
4 | 14:45:00 | 2012-08-03 14:00:00 | 3
4 | 14:45:00 | 2012-08-03 14:45:00 | 4 <-- Most recent entry up until 14:45:00
3 | 14:45:00 | 2012-08-03 14:00:00 | 1
3 | 14:45:00 | 2012-08-03 14:15:00 | 1
3 | 14:45:00 | 2012-08-03 14:45:00 | 8 <-- Most recent entry up until 14:45:00
2 | 14:45:00 | 2012-08-03 14:00:00 | 3
2 | 14:45:00 | 2012-08-03 14:15:00 | 4
2 | 14:45:00 | 2012-08-03 14:45:00 | 7 <-- Most recent entry up until 14:45:00
1 | 14:45:00 | 2012-08-03 14:00:00 | 2
1 | 14:45:00 | 2012-08-03 14:15:00 | 3
1 | 14:45:00 | 2012-08-03 14:30:00 | 6
1 | 14:45:00 | 2012-08-03 14:45:00 | 3 <-- Most recent entry up until 14:45:00
SQLFiddle 演示
第 3 步:
我们的下一步是采用上面的结果集,并为每个 intrvl 仅提取最新的连接Datetime
。我们可以通过将GROUP BY
与MAX()
聚合函数结合使用来实现此目的。
不幸的是,由于GROUP BY
的行为方式,我们也无法正确地将Users
的值与每个选定的Datetime
一起拉取。
SELECT a.UID,
b.intrvl,
MAX(c.Datetime) last_date
FROM (SELECT DISTINCT UID FROM tbl) a
CROSS JOIN (
SELECT '14:00:00' intrvl UNION ALL
SELECT '14:15:00' UNION ALL
SELECT '14:30:00' UNION ALL
SELECT '14:45:00'
) b
JOIN tbl c ON a.UID = c.UID
AND TIME(b.intrvl) >= TIME(c.Datetime)
GROUP BY a.UID,
b.intrvl
ORDER BY b.intrvl, a.UID DESC --Again, for display purposes
呈现:
UID | intrvl | last_date
---------------------------------------
4 | 14:00:00 | 2012-08-03 14:00:00
3 | 14:00:00 | 2012-08-03 14:00:00
2 | 14:00:00 | 2012-08-03 14:00:00
1 | 14:00:00 | 2012-08-03 14:00:00
4 | 14:15:00 | 2012-08-03 14:00:00
3 | 14:15:00 | 2012-08-03 14:15:00
2 | 14:15:00 | 2012-08-03 14:15:00
1 | 14:15:00 | 2012-08-03 14:15:00
4 | 14:30:00 | 2012-08-03 14:00:00
3 | 14:30:00 | 2012-08-03 14:15:00
2 | 14:30:00 | 2012-08-03 14:15:00
1 | 14:30:00 | 2012-08-03 14:30:00
4 | 14:45:00 | 2012-08-03 14:45:00
3 | 14:45:00 | 2012-08-03 14:45:00
2 | 14:45:00 | 2012-08-03 14:45:00
1 | 14:45:00 | 2012-08-03 14:45:00
SQLFiddle 演示
步骤 4
现在我们必须获取每个last_date
的Users
值,以便我们可以取这些值的平均值。为此,我们将最后一步中的查询包装为 FROM
子句中的子选择,然后再次连接回主表,条件是对于每个匹配的 UID
-> last_date
关联,获取 Users
的值。
SELECT a.UID,
a.last_date,
b.Users
FROM (
SELECT a.UID,
MAX(c.Datetime) last_date
FROM (SELECT DISTINCT UID FROM tbl) a
CROSS JOIN (
SELECT '14:00:00' intrvl UNION ALL
SELECT '14:15:00' UNION ALL
SELECT '14:30:00' UNION ALL
SELECT '14:45:00'
) b
JOIN tbl c ON a.UID = c.UID
AND TIME(b.intrvl) >= TIME(c.Datetime)
GROUP BY a.UID,
b.intrvl
) a
JOIN tbl b ON a.UID = b.UID
AND a.last_date = b.Datetime
ORDER BY a.UID DESC --Display purposes again
呈现:
UID | last_date | Users
---------------------------------
4 | 2012-08-03 14:00:00 | 3
4 | 2012-08-03 14:00:00 | 3
4 | 2012-08-03 14:00:00 | 3
4 | 2012-08-03 14:45:00 | 4
3 | 2012-08-03 14:00:00 | 1
3 | 2012-08-03 14:15:00 | 1
3 | 2012-08-03 14:15:00 | 1
3 | 2012-08-03 14:45:00 | 8
2 | 2012-08-03 14:00:00 | 3
2 | 2012-08-03 14:15:00 | 4
2 | 2012-08-03 14:15:00 | 4
2 | 2012-08-03 14:45:00 | 7
1 | 2012-08-03 14:00:00 | 2
1 | 2012-08-03 14:15:00 | 3
1 | 2012-08-03 14:30:00 | 6
1 | 2012-08-03 14:45:00 | 3
SQLFiddle 演示
步骤 5
现在只需对每个房间进行分组并平均Users
列即可:
SELECT b.Name,
AVG(b.Users) avg_users
FROM (
SELECT a.UID,
MAX(c.Datetime) last_date
FROM (SELECT DISTINCT UID FROM tbl) a
CROSS JOIN (
SELECT '14:00:00' intrvl UNION ALL
SELECT '14:15:00' UNION ALL
SELECT '14:30:00' UNION ALL
SELECT '14:45:00'
) b
JOIN tbl c ON a.UID = c.UID
AND TIME(b.intrvl) >= TIME(c.Datetime)
GROUP BY a.UID,
b.intrvl
) a
JOIN tbl b ON a.UID = b.UID
AND a.last_date = b.Datetime
GROUP BY b.UID,
b.Name
呈现:
Name | avg_users
------------------
Room 1 | 3.5
Room 2 | 4.5
Room 3 | 2.75
Room 4 | 3.25
最终结果的 SQLFiddle 演示
我只是玩了一下MySQL变量,并提出了以下想法:
只需计算用户随时间推移的(离散(积分,然后除以总时间。
SET @avgSum := @lastValue := @lastTime := @firstTime := 0;
SELECT
*,
@firstTime := IF(@firstTime = 0, UNIX_TIMESTAMP(`DateTime`), @firstTime),
@avgSum := @avgSum + (UNIX_TIMESTAMP(`DateTime`) - @lastTime) * @lastValue,
@lastValue,
@lastTime,
@lastValue := `Users`,
@lastTime := UNIX_TIMESTAMP(`DateTime`),
@avgSum / (UNIX_TIMESTAMP(`DateTime`) - @firstTime) AS `average`
FROM
`table`
WHERE
`UID` = 1 AND
UNIX_TIMESTAMP(`DateTime`) >= … AND
UNIX_TIMESTAMP(`DateTime`) < …
ORDER BY
UNIX_TIMESTAMP(`DateTime`) ASC;
@firstTime
是第一个用户记录的时间戳,@avgSum
一段时间内用户的总和(积分(。 @lastValue
和@lastTime
是前一条记录的值和时间。列average
是用户总和除以整个间隔(不介意第一条记录除以零的NULL
(。
仍然存在两个限制:给定间隔的第一条和最后一条记录必须存在。如果没有,平均在最后一个可用记录处"结束"。
我认为这在适应所有时间范围方面做得很好,即使签入间隔不均匀。 另外,我认为您的示例中有错误;在加权平均值中,房间 2 的最后一个值为"4"而不是"7"。
设置:
if object_id(N'avgTbl', N'U') is not null
drop table avgTbl;
create table avgTbl (
UserId int not null,
RoomName nvarchar(10) not null,
CheckInTime datetime not null,
UserCount int not null,
constraint pk_avgTbl primary key (UserId, RoomName, CheckInTime)
);
insert into avgTbl (UserId, RoomName, CheckInTime, UserCount) values
(4, 'Room 4', '2012-08-03 14:00:00', 3),
(2, 'Room 2', '2012-08-03 14:00:00', 3),
(3, 'Room 3', '2012-08-03 14:00:00', 1),
(1, 'Room 1', '2012-08-03 14:00:00', 2),
(3, 'Room 3', '2012-08-03 14:15:00', 1),
(2, 'Room 2', '2012-08-03 14:15:00', 4),
(1, 'Room 1', '2012-08-03 14:15:00', 3),
(1, 'Room 1', '2012-08-03 14:30:00', 6),
(1, 'Room 1', '2012-08-03 14:45:00', 3),
(2, 'Room 2', '2012-08-03 14:45:00', 7),
(3, 'Room 3', '2012-08-03 14:45:00', 8),
(4, 'Room 4', '2012-08-03 14:45:00', 4);
查询:
/*
* You just need to enter the start and end times below.
* They can be any intervals, as long as the start time is
* before the end time.
*/
declare
@startTime datetime = '2012-08-03 14:00:00',
@endTime datetime = '2012-08-03 15:00:00';
declare
@totalTime numeric(18,1) = datediff(MINUTE, @startTime, @endTime);
/*
* This orders the observations, and assigns a sequential number so we can
*join on it later.
*/
with diffs as (
select
row_number() over (order by RoomName, CheckInTime) as RowNum,
CheckInTime,
UserCount,
RoomName
from avgTbl
),
/*
* Get the time periods,
* calc the number of minutes,
* divide by the total minutes in the period,
* multiply by the UserCount to get the weighted value,
* sum the weighted values to get the weighted avg.
*/
mins as (
select
cur.RoomName,
/*
* If we do not have an observation for a given room, use "0" instead
* of "null", so it does not affect calculations later.
*/
case
when prv.UserCount is null then 0
else prv.UserCount
end as UserCount,
/* The current observation time. */
cur.CheckInTime as CurrentT,
/* The prior observation time. */
prv.CheckInTime as PrevT,
/*
* The difference in minutes between the current, and previous qbservation
* times. If it is the first observation, then use the @startTime as the
* previous observation time. If the current time is null, then use the
* end time.
*/
datediff(MINUTE,
case
when prv.CheckInTime is null then @startTime
else prv.CheckInTime
end,
case
when cur.CheckInTime is null then @endTime
else cur.CheckInTime
end) as Mins
from diffs as cur
/*
* Join the observations based on the row numbers. This gets the current,
* and previous observations together in the same record, so we can
* perform our calculations.
*/
left outer join diffs as prv on cur.RowNum = prv.RowNum + 1
and cur.RoomName = prv.RoomName
union
/*
* Add the end date as a period end, assume that the user count is the same
* as the last observation.
*/
select
d.RoomName,
d.UserCount,
@endTime,
d.CheckInTime, -- The last recorded observation time.
datediff(MINUTE, d.CheckInTime, @endTime) as Mins
from diffs as d
where d.RowNum in (
select MAX(d2.RowNum)
from diffs as d2
where d2.RoomName = d.RoomName
)
group by d.RoomName, d.CheckInTime, d.UserCount
)
/* Now we just need to get our weighted average calculations. */
select
m.RoomName,
count(1) - 1 as NumOfObservations,
/*
* m.Min = minutes during which "UserCount" is the active number.
* @totalTime = total minutes between start and end.
* m.Min / @totalTime = the % of the total time.
* (m.Min / @totalTime) * UserCount = The weighted value.
* sum(..above..) = The total weighted average across the observations.
*/
sum((m.Mins/@totalTime) * m.UserCount) as WgtAvg
from mins as m
group by m.RoomName
order by m.RoomName;