我目前正在进行SQL分析。共有3个表,第一个表包含name_id;时间戳;描述第二个和第三个表只有name_id;时间戳。
我想要实现的是下面的目标表这样的表格:
- 该表的前3列本质上是table_1有序的
- 列
other_timestamp
应通过从table_1中获取name_id; timestamp
并通过name_id
搜索table_2和able_3来填充,并从table_1找到与name_id
相关的下一个按时间顺序排列的timestamp
(可以在任一表中( - 列
source_of_other_timestamp
应通过显示other_timestamp
来自哪个表来填充(table_2或table_3(
表_1
name_id | timestamp | description
name_1 | timestamp_1 | description_1
name_2 | timestamp_2 | description_3
...
name_n | timestamp_n | description_n
表2和表3
name_id | other_timestamp
name_1 | other_timestamp_1
name_2 | other_timestamp_2
...
name_n | other_timestamp_n
目标表
name_id | timestamp | description | other_timestamp | source_of_other_timestamp
name_1 | timestamp_1 | description_1 | other_timestamp_1 | table_2
name_2 | timestamp_2 | description_2 | other_timestamp_2 | table_3
...
name_n | timestamp_n | description_n | other_timestamp_n | table_2
谢谢!
这可能非常复杂。但我认为最有效的方法是union all
和窗口函数:
with t as (
select name_id, timestamp, description, 1 as source
from table1
union all
select name_id, timestamp, null, 2
from table2
select name_id, timestamp, null, 3
)
select t.*
from (select t.*,
max(source) over (partition by other_timestamp, nameid) as other_source
from (select t.*,
min(case when source in (2, 3) then timestamp end) over (partition by nameid order by timetamp desc) as other_timestamp
from t
) t
) t
where source = 1;
下面是BigQuery标准SQL
#standardSQL
WITH tables_2_3 AS (
SELECT *, 'table_2' source_of_other_timestamp FROM `project.dataset.table_2` UNION ALL
SELECT *, 'table_3' FROM `project.dataset.table_3`
)
SELECT t1.*,
ARRAY(
SELECT AS STRUCT other_timestamp, source_of_other_timestamp
FROM tables_2_3 t23
WHERE t23.name_id = t1.name_id
AND t23.other_timestamp > t1.timestamp
ORDER BY t23.other_timestamp
LIMIT 1
)[OFFSET(0)].*
FROM `project.dataset.table_1` t1
-- ORDER BY name_id
输出低于
Row name_id timestamp description other_timestamp source_of_other_timestamp
1 name_1 timestamp_11 description_1 timestamp_21 table_3
2 name_2 timestamp_12 description_3 timestamp_22 table_2
3 name_n timestamp_1n description_n timestamp_2n table_3
注意:由于输出只从表_2或表_3中获得一个最接近的条目,因此可以进一步重构/简化为
#standardSQL
WITH tables_2_3 AS (
SELECT *, 'table_2' source_of_other_timestamp FROM `project.dataset.table_2` UNION ALL
SELECT *, 'table_3' FROM `project.dataset.table_3`
)
SELECT t1.*,
(
SELECT AS STRUCT other_timestamp, source_of_other_timestamp
FROM tables_2_3 t23
WHERE t23.name_id = t1.name_id
AND t23.other_timestamp > t1.timestamp
ORDER BY t23.other_timestamp
LIMIT 1
).*
FROM `project.dataset.table_1` t1
-- ORDER BY name_id
显然具有相同的输出