使用交叉联接填充 Postgresql 中每个起点-目的地对的时间序列



SQLfiddle : http://sqlfiddle.com/#!17/0ba48b

假设我有一张这样的桌子

CREATE TABLE trips (origin int, destination int, t TIMESTAMP);
INSERT INTO trips VALUES
    (1, 1, '2016-06-22 08:02:00'),
    (1, 3, '2016-06-22 08:10:00'),
    (1, 3, '2016-06-22 08:12:00'),
    (2, 1, '2016-06-22 08:49:00');

起点/目的地集{1,2,3} 。我想生成一个这样的表:

origin | destination |      t       |count 
--------------------------------------------
      1 |          1 |     08:00:00 | 1
      1 |          1 |     08:15:00 | 0
      1 |          1 |     08:30:00 | 0
      1 |          1 |     08:45:00 | 0
      1 |          2 |     08:00:00 | 0
     ...|            |              |
      1 |          3 |     08:00:00 | 2
      1 |          3 |     08:15:00 | 0

基本上,对于每个起点-目的地对,我想生成整个时间序列(间隔 15 分钟(,聚合行程并在需要时插入 0/null。

到目前为止我的尝试:

with 
    -- generate bins
    bins as (
    select bin
        from 
        (select generate_series(
      date_trunc('hour', 
          (SELECT min(t) 
      FROM trips 
      LIMIT 1)) , -- start at one day ago, rounded to the hour
      date_trunc('hour', (SELECT max(t) 
      FROM trips 
      LIMIT 1))+ '1 day'::interval,  -- stop at now, rounded to the hour
      '15 minutes'::interval -- one hour intervals
        ) as bin ) intervals
      where extract('ISODOW' FROM bin) < 6  -- exclude weekends
    ),
    -- bin the od table https://gis.stackexchange.com/a/127874/13965
    od as (
    SELECT COUNT(*) cnt, origin, destination,
    to_timestamp(floor((extract('epoch' from t) / 900 )) * 900)  -- this bins it to 15 minutes, but has many missing values 
    AT TIME ZONE 'UTC' as interval_alias_exit
    FROM 
    trips
    GROUP BY interval_alias_exit, origin, destination
    )
    -- merge the two
    select 
    * from 
    bins 
    left join od
    on bins.bin = od.interval_alias_exit
    where origin = '1'

但这不会为每个 OD 对创建时间序列。例如,当原点为1时,它只会返回两行。请看小提琴的例子。

您需要将所有点与所有点交叉连接以获得所有起点目的地对,然后将其与时间序列交叉连接。然后,您可以在相等的起点和目的地上加入行程,并且时间戳大于或等于系列的时间戳,但小于系列的时间戳加 15 分钟。然后按起点、目的地和序列的时间戳进行聚合,并计算行程的时间戳。

SELECT o.origin,
       d.destination,
       gs.t,
       count(t.t) count
       FROM (SELECT origin
                    FROM trips
             UNION
             SELECT destination
                    FROM trips) o
            CROSS JOIN (SELECT destination
                               FROM trips
                        UNION
                        SELECT origin
                               FROM trips) d
            CROSS JOIN generate_series((SELECT date_trunc('hour', min(t))
                                               + concat(floor(extract('minutes' FROM min(t)) / 15) * 15,
                                                        ' minutes')::interval
                                               FROM trips),
                                       (SELECT date_trunc('hour', max(t))
                                               + concat(floor(extract('minutes' FROM max(t)) / 15) * 15,
                                                        ' minutes')::interval
                                               FROM trips),
                                       '15 minutes'::interval) gs (t)
            LEFT JOIN trips t
                      ON t.origin = o.origin
                         AND t.destination = d.destination
                         AND t.t >= gs.t
                         AND t.t < gs.t + '15 minutes'::interval
       GROUP BY o.origin,
                d.destination,
                gs.t
       ORDER BY o.origin,
                d.destination,
                gs.t;

(对于示例所需结果中的时间序列边界,您似乎有另一种逻辑,而不是您发布的查询。我试图匹配前者。但我想你会想出如何改变它以满足你的实际需求。

SQL 小提琴

最新更新