在SQL中搜索多个表以查找下一个按时间顺序排列的时间戳,并显示它来自哪个表



我目前正在进行SQL分析。共有3个表,第一个表包含name_id;时间戳;描述第二个和第三个表只有name_id;时间戳。

我想要实现的是下面的目标表这样的表格:

  • 该表的前3列本质上是table_1有序的
  • other_timestamp应通过从table_1中获取name_id; timestamp并通过name_id搜索table_2able_3来填充,并从table_1找到与name_id相关的下一个按时间顺序排列的timestamp(可以在任一表中(
  • source_of_other_timestamp应通过显示other_timestamp来自哪个表来填充(table_2table_3(

表_1

name_id | timestamp   | description
name_1  | timestamp_1 | description_1
name_2  | timestamp_2 | description_3
...
name_n  | timestamp_n | description_n

表2和表3

name_id | other_timestamp   
name_1  | other_timestamp_1 
name_2  | other_timestamp_2  
...
name_n  | other_timestamp_n 

目标表

name_id | timestamp   | description   | other_timestamp   | source_of_other_timestamp
name_1  | timestamp_1 | description_1 | other_timestamp_1 | table_2
name_2  | timestamp_2 | description_2 | other_timestamp_2 | table_3
...
name_n  | timestamp_n | description_n | other_timestamp_n | table_2

谢谢!

这可能非常复杂。但我认为最有效的方法是union all和窗口函数:

with t as (
select name_id, timestamp, description, 1 as source
from table1
union all
select name_id, timestamp, null, 2
from table2
select name_id, timestamp, null, 3
)
select t.*
from (select t.*,
max(source) over (partition by other_timestamp, nameid) as other_source
from (select t.*,
min(case when source in (2, 3) then timestamp end) over (partition by nameid order by timetamp desc) as other_timestamp
from t
) t
) t
where source = 1;

下面是BigQuery标准SQL

#standardSQL
WITH tables_2_3 AS (
SELECT *, 'table_2' source_of_other_timestamp FROM `project.dataset.table_2` UNION ALL
SELECT *, 'table_3' FROM `project.dataset.table_3` 
)
SELECT t1.*,  
ARRAY(
SELECT AS STRUCT other_timestamp, source_of_other_timestamp
FROM tables_2_3 t23
WHERE t23.name_id = t1.name_id
AND t23.other_timestamp > t1.timestamp
ORDER BY t23.other_timestamp
LIMIT 1
)[OFFSET(0)].*
FROM `project.dataset.table_1` t1
-- ORDER BY name_id

输出低于

Row name_id timestamp       description     other_timestamp source_of_other_timestamp    
1   name_1  timestamp_11    description_1   timestamp_21    table_3  
2   name_2  timestamp_12    description_3   timestamp_22    table_2  
3   name_n  timestamp_1n    description_n   timestamp_2n    table_3    

注意:由于输出只从表_2或表_3中获得一个最接近的条目,因此可以进一步重构/简化为

#standardSQL
WITH tables_2_3 AS (
SELECT *, 'table_2' source_of_other_timestamp FROM `project.dataset.table_2` UNION ALL
SELECT *, 'table_3' FROM `project.dataset.table_3` 
)
SELECT t1.*,  
(
SELECT AS STRUCT other_timestamp, source_of_other_timestamp
FROM tables_2_3 t23
WHERE t23.name_id = t1.name_id
AND t23.other_timestamp > t1.timestamp
ORDER BY t23.other_timestamp
LIMIT 1
).*
FROM `project.dataset.table_1` t1
-- ORDER BY name_id

显然具有相同的输出

相关内容

  • 没有找到相关文章

最新更新