匹配具有唯一时间戳的三个DB表的最佳方法是什么



我有三个表,它们全天从三个数据源接收新信息。

Table A     Table B     Table C
5, 8:00     J, 8:00     3, 8:00
6, 8:01     K, 8:02     8, 8:04
4, 8:03
9, 8:06

在一天结束时,我想按时间顺序处理数据,因为三个时间戳中的任何一个都会发生变化,所以我需要最新的三条信息。我想要的结果是:

Table A     Table B     Table C     *Data As of*
5, 8:00     J, 8:00     3, 8:00     *8:00*
6, 8:01     J, 8:00     3, 8:00     *8:01*
6, 8:01     K, 8:02     3, 8:00     *8:02*
4, 8:03     K, 8:02     3, 8:00     *8:03*
4, 8:03     K, 8:02     8, 8:04     *8:04*
9, 8:06     K, 8:02     8, 8:04     *8:06*

我目前正在将3个查询转储到3个数据表中。然后,我通过获取三个时间戳中最早的时间戳,一次遍历所有三个。这是可行的,但有点麻烦。一张表每天有大约300万条记录,一张有200条,还有一张有少量记录。有时我一次处理20天的数据。对最佳方法的想法?

这至少是一种方法。可能需要对性能进行一些分析,但这是假设您创建了一个具有不同时间的表。如果仅在分钟级别(或表上的任何内容)上进行此操作还不够,那么在运行此操作之前,您当然可以从每个表中执行"插入#time,选择不同的时间…",但这也可能相当繁重。

select distinct
  a.id as a_id,
  a.time as a_time,
  b.id as b_id,
  b.time as b_time,
  c.id as c_id,
  c.time as c_time
from
  time t
  outer apply (
    select top 1 id, time
    from tablea a
    where a.time <= t.time
    order by a.time desc
  ) a
  outer apply (
    select top 1 id, time
    from tableb b
    where b.time <= t.time
    order by b.time desc
  ) b
  outer apply (
    select top 1 id, time
    from tablec c
    where c.time <= t.time
    order by c.time desc
  ) c
order by 
  a_time, 
  b_time, 
  c_time

SQL Fiddle:http://sqlfiddle.com/#!3/de7ae/6

尝试下一个脚本(SQL2012+):

-- Step #1: it creates a table to store all distinct TS
CREATE TABLE #AllTS (TS DATETIME NOT NULL PRIMARY KEY) -- Change type of TS column with the proper data type
-- Step #2: it inserts distinct (UNION) TS values
INSERT  #AllTS
SELECT  TS
FROM (
    SELECT TS FROM dbo.A
    UNION SELECT TS FROM dbo.B
    UNION SELECT TS FROM dbo.C
) x(TS)
-- Step #3: for every source table use bellow query to generate requested resultset
SELECT  MAX(y.Col1)OVER(PARTITION BY GroupID) AS Col1,
        MAX(y.TS)OVER(PARTITION BY GroupID) AS TS
FROM (
    SELECT  a.Col1, a.TS, SUM(CASE WHEN a.TS IS NOT NULL THEN 1 ELSE 0 END) OVER(ORDER BY x.TS) AS GroupID
    FROM    #AllTS x LEFT JOIN dbo.A a ON x.TS = a.TS
) y

注意1:您应该尝试使用在TS列上的每个源表上创建的索引来加快上述查询。示例:

CREATE INDEX IX_A_TS_#_Col1 ON dbo.A(TS) INCLUDE (Col1)

注2:另外,为了提高上次查询的性能,您可以测试不同的联合提示:

#AllTS x LEFT HASH JOIN dbo.A -- Could be useful  when source tables are "big"

#AllTS x MERGE JOIN dbo.A

Demo

最新更新