我有两个表,它们在不同时间捕获传感器数据,需要帮助编写按时连接这些传感器数据的 sql 查询。
sensor1
数据(为了简化起见,我每 5 分钟制作一次,但不一定(:
sensor1_time | sensor1_value
09:55am | A
10:00am | B
10:05am | C
10:10am | D
10:15am | E
10:20am | F
sensor2
数据:
sensor2_time | Sensor2_value
9:50am | X
10:05am | Y
10:19am | Z
我想加入sensor1
sensor2
表和输出应如下所示:
sensor1_time | sensor2_time | Sensor1_value | Sensor2_value
09:55am | 9:50am | A | X
10:00am | 9:50 am | B | X
10:05am | 10:03am | C | Y
10:10am | 10:05am | D | Y
10:15am | 10:05am | E | Y
10:20am | 10:19am | F | Z
您可以JOIN
两个表并使用NOT EXISTS
条件在sensor2
中选择相关记录,如下所示:
SELECT
s1.sensor1_time,
s2.sensor2_time,
s1.sensor1_value,
s2.sensor2_value
FROM sensor1 s1
INNER JOIN sensor2 s2
ON s1.sensor1_time >= s2.sensor2_time
AND NOT EXISTS (
SELECT 1
FROM sensor2 s20
WHERE
s1.sensor1_time >= s20.sensor2_time
AND s20.sensor2_time > s2.sensor2_time
)
这个带有示例数据的数据库小提琴演示返回:
sensor1_time | sensor2_time | sensor1_value | sensor2_value
:----------- | :----------- | :------------ | :------------
09:55 | 09:50 | A | X
10:00 | 09:50 | B | X
10:05 | 10:05 | C | Y
10:10 | 10:05 | D | Y
10:15 | 10:05 | E | Y
10:20 | 10:19 | F | Z
下面是 BigQuery Standard SQL
#standardSQL
SELECT
s1.sensor1_time,
s2.sensor2_time,
s1.sensor1_value,
s2.sensor2_value
FROM sensor1 s1
JOIN (
SELECT sensor2_time,
LEAD(sensor2_time) OVER(ORDER BY sensor2_time) AS sensor2_time2,
sensor2_value
FROM sensor2
) s2
ON s1.sensor1_time >= sensor2_time AND IFNULL(s1.sensor1_time < sensor2_time2, TRUE)
上面假设您的sensor_time字段是问题标题中所述的时间戳数据类型
注意:如果将它们视为字符串,09:55am
被视为大于09:50pm
,这意味着您需要将它们转换为/解析时间
因此,如果实际上这些字段是按照您提供的示例格式化的字符串 - 您首先需要将它们强制转换/解析为正确的数据类型 - 如下面的简化示例所示:
#standardSQL
SELECT
s1.sensor1_time,
s2.sensor2_time,
s1.sensor1_value,
s2.sensor2_value
FROM sensor1 s1
JOIN (
SELECT sensor2_time,
LEAD(sensor2_time) OVER(ORDER BY PARSE_TIME('%I:%M%p', sensor2_time)) AS sensor2_time2,
sensor2_value
FROM sensor2
) s2
ON PARSE_TIME('%I:%M%p', s1.sensor1_time) >= PARSE_TIME('%I:%M%p', sensor2_time)
AND IFNULL(PARSE_TIME('%I:%M%p', s1.sensor1_time) < PARSE_TIME('%I:%M%p', sensor2_time2), TRUE)
如果稍后应用于您问题中的样本数据
WITH `sensor1` AS (
SELECT '09:55am' sensor1_time, 'A' sensor1_value UNION ALL
SELECT '10:00am', 'B' UNION ALL
SELECT '10:05am', 'C' UNION ALL
SELECT '10:10am', 'D' UNION ALL
SELECT '10:15am', 'E' UNION ALL
SELECT '10:20am', 'F'
), `sensor2` AS (
SELECT '09:50am' sensor2_time, 'X' sensor2_value UNION ALL
SELECT '10:05am', 'Y' UNION ALL
SELECT '10:19am', 'Z'
)
结果是
Row sensor1_time sensor2_time sensor1_value sensor2_value
1 09:55am 09:50am A X
2 10:00am 09:50am B X
3 10:05am 10:05am C Y
4 10:10am 10:05am D Y
5 10:15am 10:05am E Y
6 10:20am 10:19am F Z