需要在SQL合并查询中筛选目标表



我正在使用BigQuery SQL执行合并查询。这是查询

MERGE `dataset.target_table` AS Target
USING
(
select
*
from
`dataset.source_table` s_data
WHERE
trans_id is not null and user_id is not null
)
AS Source
ON Source.trans_id = Target.trans_id and Target.start_date IN 
(
select distinct start_date from `dataset.source_table`
)
WHEN NOT MATCHED BY Target THEN
INSERT (...)
VALUES (...)
WHEN MATCHED and Target.user_id is null THEN
UPDATE SET ...

我在ON语句中使用子查询时遇到问题。In Subquery not supported by join predicate

我之所以有这个子查询,是因为我想在Merge发生或bigquery抛出OOM异常之前筛选Target表。目标表为100亿行,而源表为200亿行。我不需要ON语句中的子查询,但这是一种在合并发生之前对Target表进行实质性筛选的方法。我还能采取其他方法吗?

我在这里尝试了这种方法——https://dba.stackexchange.com/questions/30633/merge-a-subset-of-the-target-table利用

WITH TARGET AS 
(
SELECT * 
FROM `dataset.target_table`
WHERE <filter target_table here>
)
MERGE INTO TARGET
...

但BigQuery似乎不支持这一点,并给出了语法错误。如何在合并发生之前筛选目标表,使其不需要在内存中加载整个表?

我有点困惑。你能从source中的匹配行中取start_date吗?

MERGE `dataset.target_table` AS Target USING
(select *
from `dataset.source_table` s_data
WHERE  trans_id is not null and user_id is not null
) AS Source
ON Source.trans_id = Target.trans_id and
Target.start_date = source.start_date 

最新更新