SQL分段/平均基于不同的时间标记/时间戳和用户选择的输入(T-SQL)



我有以下问题,我想应付-我有一个SQL数据集,我想节(例如像这一个):

OldTimetag                 OldValue
2012-05-03 12:47:00        5
2012-05-03 13:00:00        1.3
2012-05-03 13:21:00        7
2012-05-03 14:56:00        5
2012-05-03 14:57:00        0.3
....                       ....

现在,我想根据用户选择的间隔将数据分割(和/或平均)到新的时间标签中,例如,每15分钟以第一个时间标签为起点,即:

NewTimetag                 NewValue
2012-05-03 12:47:00        4.507
2012-05-03 13:02:00        1.3 
....                       ....

主要约束是时间标记旁边的值总是有效的,直到下一个时间标记出现。因此,时间标签2012-05-03 12:47:00处的值5在接下来的13分钟内有效,直到13:00:00。从12:47:00开始的前15分钟的值为(13*5+2*1.3)/15 = 4.507。在接下来的15分钟,在13:02:00,这个值等于1.3…(以此类推)

我来了这么久,这是一个好主意,先做一个"人造表",然后与旧表连接。生成这个表的方法是:

DECLARE @intStart datetime, @intEnd datetime

SELECT @intStart =min(OldTimetag), @intEnd = MAX(OldTimetag)
FROM OldTable
    where OldTimetag between '2012-05-03 12:47:00' and '2012-05-03 14:57:00'
Declare @ArtificalTable table (NewTimeTag datetime, NewValue Float)
Declare @MinuteSlicer Int
Set @MinuteSlicer = 15
Insert @Hallo Select @intStart, null
While ( @intStart < @intEnd ) BEGIN
    Insert @ArtificalTable
    Select DATEADD(mi,@MinuteSlicer, @intStart), Null 
        Set  @intStart = DATEADD(mi,@MinuteSlicer,@intStart)
        If @intEnd <= DATEADD(mi,@MinuteSlicer,@intStart) 
            Break
End 

输出如下:

NewTimetag                 NewValue
2012-05-03 12:47:00        Null
2012-05-03 13:02:00        Null
....                       ....

然而,我有问题的下一步,如何正确地加入表-谁能给我一个提示?

这里有一种方法。

样本数据:

declare @data table(OldTimetag datetime2, OldValue numeric(5,2));
Insert into @data(OldTimetag, OldValue) Values
    ('2012-05-03 12:47:00', 5)
    , ('2012-05-03 13:00:00', 1.3)
    , ('2012-05-03 13:21:00', 7)
    , ('2012-05-03 14:56:00', 5)
    , ('2012-05-03 14:57:00', 0.3);

您的自定义范围大小(以分钟为单位):

declare @mins int = 15;

List用于快速计算数字从0到n的有序列表,其中n <=第一个和最后一个OldTimetag之间的分钟数。

With list(n) as (
    Select top(Select 1+DATEDIFF(minute, min(OldTimetag), max(OldTimetag)) From @data) 
        ROW_NUMBER() over(order by (select 1))-1
    From (
        Select 1 From (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x1(n)
        Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x2(n)
        Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x3(n)
    ) as x(n)
)
Select NewTimetag = DATEADD(minute, @mins*(l.n/@mins), MIN(r.startTime)), NewValue = AVG(d.oldValue)
From list l
Cross Join (Select startTime = min(OldTimetag) From @data) as r
Cross Apply (Select maxTimetag = MAX(OldTimetag) From @data Where OldTimetag <= DATEADD(minute, n, startTime)) as mx
Inner Join @data d on d.OldTimetag = mx.maxTimetag
Group By l.n/@mins
  • Cross Join用于将有序列表中的每个数字与数据中的第一个OldTimetag混合。
  • Cross Apply用于在Cross Join创建的每分钟之前获得最近的OldTimetag
  • Inner Join然后匹配最近的OldTimetag与您的数据,以检索oldValue
  • Select只需要计算@min分钟和NewTimetag上每个范围的平均值。

它在最小和最大OldTimetag之间的1000分钟范围内工作得很好。如果您需要超出这个限制,您可以在列表中添加第4行CTE:

Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x4(n) => up to 10.000
Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x5(n) => up to 100.000
...

一种方法是确定间隔(如果间隔包含至少一个时间戳,则生成间隔),用下一个时间戳增加时间表,然后通过将间隔与时间表相交来计算每个这样的间隔的平均值。

IF OBJECT_ID('tempdb..#values') IS NOT NULL DROP TABLE #values
CREATE TABLE #values (pk int identity, time datetime, value numeric(10,4))
INSERT INTO #values VALUES ('2012-05-03 12:47:00', 5)
INSERT INTO #values VALUES ('2012-05-03 13:00:00', 1.3)
INSERT INTO #values VALUES ('2012-05-03 13:21:00', 7)
INSERT INTO #values VALUES ('2012-05-03 14:56:00', 5)
INSERT INTO #values VALUES ('2012-05-03 14:57:00', 0.3)
DECLARE @timeSpanMinutes int SET @timeSpanMinutes=15
DECLARE @startTime datetime, @endTtime datetime
SELECT @startTime=MIN(time) FROM #values
SELECT @endTtime =DATEADD(MINUTE,(DATEDIFF(MINUTE,@startTime,MAX(time))
    /@timeSpanMinutes+1)*@timeSpanMinutes, @startTime) FROM #values -- MAX(time) multiple
SELECT intervals.start
  , SUM(value*(DATEDIFF(MINUTE -- minutes in intersection of [start,end] and [time,next]
     , CASE WHEN time<start THEN start ELSE time END -- Maximum(time,start)
     , CASE WHEN next<DATEADD(MINUTE,@timeSpanMinutes,intervals.start) THEN next 
            ELSE DATEADD(MINUTE,@timeSpanMinutes,intervals.start) END -- Minimum(next,end)
     )*1.0/@timeSpanMinutes)) as average
  FROM
  (SELECT DISTINCT DATEADD(MINUTE, (DATEDIFF(MINUTE,@startTime,time)
        /@timeSpanMinutes)*@timeSpanMinutes, @startTime) AS start 
      FROM #values -- round start to multiple of @timeSpanMinutes
    UNION SELECT DISTINCT DATEADD(MINUTE,@timeSpanMinutes+(DATEDIFF(MINUTE,@startTime,time)
        /@timeSpanMinutes)*@timeSpanMinutes, @startTime) 
      FROM #values -- union distinct with same as above but shifted with @timeSpanMinutes
  ) intervals -- intervals start time (end is calculated as start + @timeSpanMinutes)
  INNER JOIN 
  (SELECT v.*,ISNULL((SELECT MIN(time) FROM #values WHERE time>v.time),@endTtime) as next 
       FROM #values v -- add next column to #values
  ) vals
  ON vals.next>=intervals.start and vals.time<=DATEADD(MINUTE,@timeSpanMinutes,start)
  WHERE intervals.start<>@endTtime
  GROUP BY intervals.start
  ORDER BY intervals.start

最新更新