目前我有这样构建的表格
DeviceID Timestamp Value
----------------------------------------
Device1 1.1.2011 10:00:00 3
Device1 1.1.2011 10:00:01 4
Device1 1.1.2011 10:00:02 4
Device1 1.1.2011 10:00:04 3
Device1 1.1.2011 10:00:05 4
Device1 1.1.2011 14:23:14 8
Device1 1.1.2011 14:23:15 7
Device1 1.1.2011 14:23:17 4
Device1 1.1.2011 14:23:18 2
如您所见,具有给定时间戳的设备(列类型为日期时间)会有一些值。
问题是设备可以在任何时候启动和停止,并且数据中没有发生启动或停止的直接信息。但是从给定时间戳列表中,很容易判断何时开始和停止发生,因为只要两行的时间戳在五秒内,它们就属于同一测量值。
现在我想从这些数据中得到一个这样的列表:
DeviceID Started Ended
Device1 1.1.2011 10:00:00 1.1.2011 10:00:05
Device1 1.1.2011 14:23:14 1.1.2011 14:23:18
那么任何想法如何快速做到这一点?我所能想到的就是使用某种光标并手动比较每个日期时间对。但我认为这会变得非常慢,因为我们必须检查每行中的每个值。
那么有没有更好的SQL解决方案不适用于游标呢?
更新
目前,我测试了所有给定的答案。通过阅读,它们看起来都不错,并且有一些有趣的方法。不幸的是,所有这些(到目前为止)在真实数据上都失败了。最大的问题似乎是数据量(目前表中大约有350万个条目)。仅对一小部分执行给定查询会导致预期的结果,但将查询滚动到整个表上只会导致性能非常差。
我必须进一步测试和检查我是否可以将数据分块,并且只将部分数据传递给这些给定算法之一来滚动。但也许你们中的一个人有另一个聪明的想法,可以更快地获得结果。
更新(有关结构的更多信息)
好的,这些信息也可能有所帮助:目前,该表中大约有350万个条目。以下是给定的列类型和独立:
- _ID
- 国际
- 主键
- 分组索引
- 在我的示例中没有提到此列,因为此查询不需要它
- 设备标识
- 国际
- 不为空
- 指数
- 时间戳
- 日期时间
- 不为空
- 指数
- 价值
- 几个不同类型的未索引列(整数、实数、小英特)
- 所有都可以为空
也许这有助于改进您针对给定问题的现有(或新)解决方案。
-- Table var to store the gaps
declare @T table
(
DeviceID varchar(10),
PrevPeriodEnd datetime,
NextPeriodStart datetime
)
-- Get the gaps
;with cte as
(
select *,
row_number() over(partition by DeviceID order by Timestamp) as rn
from data
)
insert into @T
select
C1.DeviceID,
C1.Timestamp as PrevPeriodEnd,
C2.Timestamp as NextPeriodStart
from cte as C1
inner join cte as C2
on C1.rn = C2.rn-1 and
C1.DeviceID = C2.DeviceID and
datediff(s, C1.Timestamp, C2.Timestamp) > 5
-- Build islands from gaps in @T
;with cte1 as
(
-- Add first and last timestamp to gaps
select DeviceID, PrevPeriodEnd, NextPeriodStart
from @T
union all
select DeviceID, max(TimeStamp) as PrevPeriodEnd, null as NextPeriodStart
from data
group by DeviceID
union all
select DeviceID, null as PrevPeriodEnd, min(TimeStamp) as PrevPeriodEnd
from data
group by DeviceID
),
cte2 as
(
select *,
row_number() over(partition by DeviceID order by PrevPeriodEnd) as rn
from cte1
)
select
C1.DeviceID,
C1.NextPeriodStart as PeriodStart,
C2.PrevPeriodEnd as PeriodEnd
from cte2 as C1
inner join cte2 as C2
on C1.DeviceID = C2.DeviceID and
C1.rn = C2.rn-1
order by C1.DeviceID, C1.NextPeriodStart
我已经尝试了一些数据类型和名称(只是因为我可以,并且因为时间戳是一个保留字),并且可以使用您的示例数据获得您请求的结果。
示例数据:
create table Measures (
DeviceID int not null,
Occurred datetime not null,
Value int not null,
constraint PK_Measures PRIMARY KEY (DeviceID,Occurred)
)
go
insert into Measures (DeviceID,Occurred,Value)
select 1,'2011-01-01T10:00:00',3 union all
select 1,'2011-01-01T10:00:01',4 union all
select 1,'2011-01-01T10:00:02',4 union all
select 1,'2011-01-01T10:00:04',3 union all
select 1,'2011-01-01T10:00:05',4 union all
select 1,'2011-01-01T14:23:14',8 union all
select 1,'2011-01-01T14:23:15',7 union all
select 1,'2011-01-01T14:23:17',4 union all
select 1,'2011-01-01T14:23:18',2
现在查询:
;with StartPeriods as (
select m1.DeviceID,m1.Occurred as Started
from Measures m1 left join Measures m2 on m1.DeviceID = m2.DeviceID and m2.Occurred < m1.Occurred and DATEDIFF(second,m2.Occurred,m1.Occurred) < 6
where m2.DeviceID is null
), ExtendPeriods as (
select DeviceID,Started,Started as Ended from StartPeriods
union all
select
ep.DeviceID,ep.Started,m2.Occurred
from
ExtendPeriods ep
inner join
Measures m2
on
ep.DeviceID = m2.DeviceID and
ep.Ended < m2.Occurred and
DATEDIFF(SECOND,ep.Ended,m2.Occurred) < 6
)
select DeviceID,Started,MAX(Ended) from ExtendPeriods group by DeviceID,Started
StartPeriods
公用表表达式 (CTE) 从度量值表中查找那些在 5 秒内没有前一行的行。然后,ExtendPeriods
CTE 通过从度量中查找新行,递归地延长这些时间段,这些行在找到的时间段的当前结束后最多 5 秒内发生。
然后,我们找到周期结束尽可能远离开始的行。
试试这个:
select DeviceID,MIN(Timestamp),MAX(Timestamp)
from @table group by DATEPART(hh,Timestamp),DeviceID
以下解决方案的基本思想是从这个答案中借来的。
WITH data (DeviceID, Timestamp, Value) AS (
SELECT 'Device1', CAST('1.1.2011 10:00:00' AS datetime), 3 UNION ALL
SELECT 'Device1', '1.1.2011 10:00:01', 4 UNION ALL
SELECT 'Device1', '1.1.2011 10:00:02', 4 UNION ALL
SELECT 'Device1', '1.1.2011 10:00:04', 3 UNION ALL
SELECT 'Device1', '1.1.2011 10:00:05', 4 UNION ALL
SELECT 'Device1', '1.1.2011 14:23:14', 8 UNION ALL
SELECT 'Device1', '1.1.2011 14:23:15', 7 UNION ALL
SELECT 'Device1', '1.1.2011 14:23:17', 4 UNION ALL
SELECT 'Device1', '1.1.2011 14:23:18', 2
),
ranked AS (
SELECT
*,
rn = ROW_NUMBER() OVER (PARTITION BY DeviceID ORDER BY Timestamp)
FROM data
),
starts AS (
SELECT
r1.DeviceID,
r1.Timestamp,
rank = ROW_NUMBER() OVER (PARTITION BY r1.DeviceID ORDER BY r1.Timestamp)
FROM ranked r1
LEFT JOIN ranked r2 ON r1.DeviceID = r2.DeviceID
AND r1.rn = r2.rn + 1
AND r1.Timestamp <= DATEADD(second, 5, r2.Timestamp)
WHERE r2.DeviceID IS NULL
),
ends AS (
SELECT
r1.DeviceID,
r1.Timestamp,
rank = ROW_NUMBER() OVER (PARTITION BY r1.DeviceID ORDER BY r1.Timestamp)
FROM ranked r1
LEFT JOIN ranked r2 ON r1.DeviceID = r2.DeviceID
AND r1.rn = r2.rn - 1
AND r1.Timestamp >= DATEADD(second, -5, r2.Timestamp)
WHERE r2.DeviceID IS NULL
)
SELECT
s.DeviceID,
Started = s.Timestamp,
Ended = e.Timestamp
FROM starts s
INNER JOIN ends e ON s.DeviceID = e.DeviceID AND s.rank = e.rank
试试这个,虽然我不确定它在处理大量数据时的表现如何
SELECT a.TS AS [StartTime], (SELECT TOP 1 c.TS FROM TestTime c WHERE c.TS >= a.TS AND
NOT EXISTS(SELECT * FROM TestTime d WHERE d.TS > c.TS AND DATEDIFF(SECOND, c.TS, d.TS) <= 5) ORDER BY c.TS) AS [StopTime]
FROM TestTime a WHERE NOT EXISTS (SELECT * FROM TestTime b WHERE a.TS > b.TS AND DATEDIFF(SECOND, b.TS, a.TS) <= 5)
我的表称为 TestTime,该列称为 TS,因此请为您的表调整它。 我使用 NOT EXISTS 来检查当前记录<时间戳并在 _x0035_=" 秒内=" _x002D_=">=该时间戳的最低时间戳(如果是单个条目, 所以开始/停止一个),并且再次使用 NOT EXISTS 来检查大于它的记录并在 5 秒内 - 因此,再次显示未找到记录(仅第 1 条)。 您可能可以对其进行调整和改进,但这可能是一个很好的基础。时间戳并在>
请注意,如果它仍在运行,它将列出上次找到的时间作为上次启动事件的停止时间。
为简单起见,我没有在这里输入设备名称,因此您需要将其放在 StopTime 和 WHERE 子句中
DECLARE @t TABLE
(DeviceID VARCHAR(10),
[Timestamp] DATETIME,
VALUE INT
)
INSERT @t
SELECT 'Device1','20110101 10:00:00', 3
UNION SELECT 'Device1','20110101 10:00:01', 4
UNION SELECT 'Device1','20110101 10:00:02', 4
UNION SELECT 'Device1','20110101 10:00:04', 3
UNION SELECT 'Device1','20110101 10:00:05', 4
UNION SELECT 'Device1','20110101 14:23:14', 8
UNION SELECT 'Device1','20110101 14:23:15', 7
UNION SELECT 'Device1','20110101 14:23:17', 4
UNION SELECT 'Device1','20110101 14:23:18', 2
;WITH myCTE
AS
(
SELECT DeviceID, [Timestamp],
ROW_NUMBER() OVER (PARTITION BY DeviceID
ORDER BY [TIMESTAMP]
) AS rn
FROM @t
)
, recCTE
AS
(
SELECT DeviceID, [Timestamp], 0 as groupID, rn FROM myCTE
WHERE rn = 1
UNION ALL
SELECT r.DeviceID, g.[Timestamp], CASE WHEN DATEDIFF(ss,r.[Timestamp], g.[Timestamp]) <= 5 THEN r.groupID ELSE r.groupID + 1 END, g.rn
FROM recCTE AS r
JOIN myCTE AS g
ON g.rn = r.rn + 1
)
SELECT DeviceID, MIN([Timestamp]) AS [started], MAX([Timestamp]) AS ended
FROM recCTE
GROUP BY DeviceId, groupId
OPTION (MAXRECURSION 0);
您应该能够为此使用窗口函数(假设 15 分钟定义下面的新会话):
SELECT DeviceId,
Timestamp,
COALESCE((Timestamp - lag(Timestamp) OVER w) > interval '15 min', TRUE)
as session_begins
COALESCE((lead(Timestamp) OVER w - Timestamp) > interval '15 min', TRUE)
as session_ends
FROM YourTable
WINDOW w AS (PARTITION BY DeviceId ORDER BY Timestamp);
根据您的 where 子句,您可能希望删除合并/true 部分,因为获取的第一行/最后一行可能变得无效。
如果只需要边界,则可以在子查询中使用上述内容并group by DeviceId, session_begins, session_ends having session_begins or session_ends
。此外,如果您这样做,请不要忘记将 where 子句放在子查询中,而不是主子查询中,否则由于窗口聚合,您最终会对整个表进行 seq 扫描。