从日期时间表中获取包含开始值和结束值的列表



目前我有这样构建的表格

DeviceID      Timestamp            Value
----------------------------------------
Device1       1.1.2011 10:00:00    3
Device1       1.1.2011 10:00:01    4
Device1       1.1.2011 10:00:02    4
Device1       1.1.2011 10:00:04    3
Device1       1.1.2011 10:00:05    4
Device1       1.1.2011 14:23:14    8
Device1       1.1.2011 14:23:15    7
Device1       1.1.2011 14:23:17    4
Device1       1.1.2011 14:23:18    2

如您所见,具有给定时间戳的设备(列类型为日期时间)会有一些值。

问题是设备可以在任何时候启动和停止,并且数据中没有发生启动或停止的直接信息。但是从给定时间戳列表中,很容易判断何时开始和停止发生,因为只要两行的时间戳在五秒内,它们就属于同一测量值。

现在我想从这些数据中得到一个这样的列表:

DeviceID      Started              Ended
Device1       1.1.2011 10:00:00    1.1.2011 10:00:05
Device1       1.1.2011 14:23:14    1.1.2011 14:23:18

那么任何想法如何快速做到这一点?我所能想到的就是使用某种光标并手动比较每个日期时间对。但我认为这会变得非常慢,因为我们必须检查每行中的每个值。

那么有没有更好的SQL解决方案不适用于游标呢?

更新

目前,我测试了所有给定的答案。通过阅读,它们看起来都不错,并且有一些有趣的方法。不幸的是,所有这些(到目前为止)在真实数据上都失败了。最大的问题似乎是数据量(目前表中大约有350万个条目)。仅对一小部分执行给定查询会导致预期的结果,但将查询滚动到整个表上只会导致性能非常差。

必须进一步测试和检查我是否可以将数据分块,并且只将部分数据传递给这些给定算法之一来滚动。但也许你们中的一个人有另一个聪明的想法,可以更快地获得结果。

更新(有关结构的更多信息)

好的,这些信息也可能有所帮助:目前,该表中大约有350万个条目。以下是给定的列类型和独立:

  • _ID
    • 国际
    • 主键
    • 分组索引
    • 在我的示例中没有提到此列,因为此查询不需要它
  • 设备标识
    • 国际
    • 不为空
    • 指数
  • 时间戳
    • 日期时间
    • 不为空
    • 指数
  • 价值
    • 几个不同类型的未索引列(整数、实数、小英特)
    • 所有都可以为空

也许这有助于改进您针对给定问题的现有(或新)解决方案。

-- Table var to store the gaps
declare @T table
(
  DeviceID varchar(10),
  PrevPeriodEnd datetime,
  NextPeriodStart datetime
)
-- Get the gaps
;with cte as 
(
  select *,
    row_number() over(partition by DeviceID order by Timestamp) as rn
  from data
)
insert into @T
select
  C1.DeviceID,
  C1.Timestamp as PrevPeriodEnd,
  C2.Timestamp as NextPeriodStart
from cte as C1
  inner join cte as C2
    on C1.rn = C2.rn-1 and
       C1.DeviceID = C2.DeviceID and
       datediff(s, C1.Timestamp, C2.Timestamp) > 5
-- Build islands from gaps in @T
;with cte1 as
(
  -- Add first and last timestamp to gaps
  select DeviceID, PrevPeriodEnd, NextPeriodStart
  from @T
  union all
  select DeviceID, max(TimeStamp) as PrevPeriodEnd, null as NextPeriodStart
  from data
  group by DeviceID
  union all
  select DeviceID, null as PrevPeriodEnd, min(TimeStamp) as PrevPeriodEnd
  from data
  group by DeviceID
),
cte2 as
(
  select *,
    row_number() over(partition by DeviceID order by PrevPeriodEnd) as rn
  from cte1
)
select
  C1.DeviceID,
  C1.NextPeriodStart as PeriodStart,
  C2.PrevPeriodEnd as PeriodEnd
from cte2 as C1
  inner join cte2 as C2
    on C1.DeviceID = C2.DeviceID and
       C1.rn = C2.rn-1
order by C1.DeviceID, C1.NextPeriodStart       

我已经尝试了一些数据类型和名称(只是因为我可以,并且因为时间戳是一个保留字),并且可以使用您的示例数据获得您请求的结果。

示例数据:

create table Measures (
    DeviceID int not null,
    Occurred datetime not null,
    Value int not null,
    constraint PK_Measures PRIMARY KEY (DeviceID,Occurred)
)
go
insert into Measures (DeviceID,Occurred,Value)
select 1,'2011-01-01T10:00:00',3 union all
select 1,'2011-01-01T10:00:01',4 union all
select 1,'2011-01-01T10:00:02',4 union all
select 1,'2011-01-01T10:00:04',3 union all
select 1,'2011-01-01T10:00:05',4 union all
select 1,'2011-01-01T14:23:14',8 union all
select 1,'2011-01-01T14:23:15',7 union all
select 1,'2011-01-01T14:23:17',4 union all
select 1,'2011-01-01T14:23:18',2

现在查询:

;with StartPeriods as (
    select m1.DeviceID,m1.Occurred as Started
    from Measures m1 left join Measures m2 on m1.DeviceID = m2.DeviceID and m2.Occurred < m1.Occurred and DATEDIFF(second,m2.Occurred,m1.Occurred) < 6
    where m2.DeviceID is null
), ExtendPeriods as (
    select DeviceID,Started,Started as Ended from StartPeriods
    union all
    select
        ep.DeviceID,ep.Started,m2.Occurred
    from
        ExtendPeriods ep
            inner join
        Measures m2
            on
                ep.DeviceID = m2.DeviceID and
                ep.Ended < m2.Occurred and
                DATEDIFF(SECOND,ep.Ended,m2.Occurred) < 6
)
select DeviceID,Started,MAX(Ended) from ExtendPeriods group by DeviceID,Started

StartPeriods公用表表达式 (CTE) 从度量值表中查找那些在 5 秒内没有前一行的行。然后,ExtendPeriods CTE 通过从度量中查找新行,递归地延长这些时间段,这些行在找到的时间段的当前结束后最多 5 秒内发生。

然后,我们找到周期结束尽可能远离开始的行。

试试这个:

select DeviceID,MIN(Timestamp),MAX(Timestamp) 
          from @table group by DATEPART(hh,Timestamp),DeviceID

以下解决方案的基本思想是从这个答案中借来的。

WITH data (DeviceID, Timestamp, Value) AS (
  SELECT 'Device1', CAST('1.1.2011 10:00:00' AS datetime), 3 UNION ALL
  SELECT 'Device1',      '1.1.2011 10:00:01',              4 UNION ALL
  SELECT 'Device1',      '1.1.2011 10:00:02',              4 UNION ALL
  SELECT 'Device1',      '1.1.2011 10:00:04',              3 UNION ALL
  SELECT 'Device1',      '1.1.2011 10:00:05',              4 UNION ALL
  SELECT 'Device1',      '1.1.2011 14:23:14',              8 UNION ALL
  SELECT 'Device1',      '1.1.2011 14:23:15',              7 UNION ALL
  SELECT 'Device1',      '1.1.2011 14:23:17',              4 UNION ALL
  SELECT 'Device1',      '1.1.2011 14:23:18',              2
),
ranked AS (
  SELECT
    *,
    rn = ROW_NUMBER() OVER (PARTITION BY DeviceID ORDER BY Timestamp)
  FROM data
),
starts AS (
  SELECT
    r1.DeviceID,
    r1.Timestamp,
    rank = ROW_NUMBER() OVER (PARTITION BY r1.DeviceID ORDER BY r1.Timestamp)
  FROM ranked r1
    LEFT JOIN ranked r2 ON r1.DeviceID = r2.DeviceID
      AND r1.rn = r2.rn + 1
      AND r1.Timestamp <= DATEADD(second, 5, r2.Timestamp)
  WHERE r2.DeviceID IS NULL
),
ends AS (
  SELECT
    r1.DeviceID,
    r1.Timestamp,
    rank = ROW_NUMBER() OVER (PARTITION BY r1.DeviceID ORDER BY r1.Timestamp)
  FROM ranked r1
    LEFT JOIN ranked r2 ON r1.DeviceID = r2.DeviceID
      AND r1.rn = r2.rn - 1
      AND r1.Timestamp >= DATEADD(second, -5, r2.Timestamp)
  WHERE r2.DeviceID IS NULL
)
SELECT
  s.DeviceID,
  Started = s.Timestamp,
  Ended = e.Timestamp
FROM starts s
  INNER JOIN ends e ON s.DeviceID = e.DeviceID AND s.rank = e.rank

试试这个,虽然我不确定它在处理大量数据时的表现如何

SELECT a.TS AS [StartTime], (SELECT TOP 1 c.TS FROM TestTime c WHERE c.TS >= a.TS AND
    NOT EXISTS(SELECT * FROM TestTime d WHERE d.TS > c.TS AND DATEDIFF(SECOND, c.TS, d.TS) <= 5) ORDER BY c.TS) AS [StopTime]
FROM TestTime a WHERE NOT EXISTS (SELECT * FROM TestTime b WHERE a.TS > b.TS AND DATEDIFF(SECOND, b.TS, a.TS) <= 5)

我的表称为 TestTime,该列称为 TS,因此请为您的表调整它。 我使用 NOT EXISTS 来检查当前记录<时间戳并在 _x0035_=" 秒内=" _x002D_=">=该时间戳的最低时间戳(如果是单个条目, 所以开始/停止一个),并且再次使用 NOT EXISTS 来检查大于它的记录并在 5 秒内 - 因此,再次显示未找到记录(仅第 1 条)。 您可能可以对其进行调整和改进,但这可能是一个很好的基础。

请注意,如果它仍在运行,它将列出上次找到的时间作为上次启动事件的停止时间。

为简单起见,我没有在这里输入设备名称,因此您需要将其放在 StopTime 和 WHERE 子句中

DECLARE @t TABLE
(DeviceID      VARCHAR(10),
 [Timestamp]    DATETIME,
 VALUE          INT
)
INSERT @t
SELECT 'Device1','20110101 10:00:00',    3
UNION SELECT 'Device1','20110101 10:00:01',    4
UNION SELECT 'Device1','20110101 10:00:02',    4
UNION SELECT 'Device1','20110101 10:00:04',   3
UNION SELECT 'Device1','20110101 10:00:05',    4
UNION SELECT 'Device1','20110101 14:23:14',    8
UNION SELECT 'Device1','20110101 14:23:15',    7
UNION SELECT 'Device1','20110101 14:23:17',    4
UNION SELECT 'Device1','20110101 14:23:18',    2

;WITH myCTE
AS
(
    SELECT DeviceID, [Timestamp],
           ROW_NUMBER() OVER (PARTITION BY DeviceID
                              ORDER BY [TIMESTAMP]
                             ) AS rn
    FROM @t
)
, recCTE
AS
(
    SELECT DeviceID, [Timestamp],  0 as groupID, rn FROM myCTE
    WHERE rn = 1
    UNION ALL
    SELECT r.DeviceID, g.[Timestamp],  CASE WHEN DATEDIFF(ss,r.[Timestamp], g.[Timestamp]) <= 5 THEN r.groupID ELSE r.groupID + 1 END, g.rn 
    FROM recCTE AS r
    JOIN myCTE AS g
    ON g.rn = r.rn + 1
)
SELECT DeviceID, MIN([Timestamp]) AS [started], MAX([Timestamp]) AS ended
FROM recCTE
GROUP BY DeviceId, groupId
OPTION (MAXRECURSION 0);

您应该能够为此使用窗口函数(假设 15 分钟定义下面的新会话):

SELECT DeviceId,
       Timestamp,
       COALESCE((Timestamp - lag(Timestamp) OVER w) > interval '15 min', TRUE)
       as session_begins
       COALESCE((lead(Timestamp) OVER w - Timestamp) > interval '15 min', TRUE)
       as session_ends
FROM YourTable
WINDOW w AS (PARTITION BY DeviceId ORDER BY Timestamp);

根据您的 where 子句,您可能希望删除合并/true 部分,因为获取的第一行/最后一行可能变得无效。

如果只需要边界,则可以在子查询中使用上述内容并group by DeviceId, session_begins, session_ends having session_begins or session_ends 。此外,如果您这样做,请不要忘记将 where 子句放在子查询中,而不是主子查询中,否则由于窗口聚合,您最终会对整个表进行 seq 扫描。

相关内容

  • 没有找到相关文章

最新更新