我正在尝试通过databricks更新雪花表。其中我创建了databricks临时表,并基于临时表创建了查询,该查询将更新雪花表。但我不确定这是否可能。有人能帮我吗。
query = """MERGE INTO dw_3nf.temp_tgt target
USING
(SELECT source1.id as mergekey, 0 as deleted, source1.* FROM dw_3nf.temp_src as source1
UNION ALL
SELECT NULL as mergekey,0 as deleted, source1.*
FROM dw_3nf.temp_src source1 JOIN dw_3nf.temp_tgt target
ON source1.id = target.id
WHERE target.live_flag = 1 AND source1.name <> target.name
UNION ALL
SELECT target.id as mergekey, 1 as deleted, source.*
FROM dw_3nf.temp_tgt as target left join dw_3nf.temp_src as source
ON source.id = target.id
WHERE source.id is null and target.live_flag=1
) staged_updates
ON target.id = mergekey
WHEN MATCHED AND target.live_flag = 1 AND staged_updates.name <> target.name THEN
UPDATE SET live_flag = 0
WHEN MATCHED AND staged_updates.deleted = 1 and target.live_flag=1 THEN
UPDATE SET live_flag=2
WHEN NOT MATCHED THEN
INSERT (id, name, live_flag)
VALUES(staged_updates.id,staged_updates.name,1)"""
df.createOrReplaceTempView("source")
spark.write
.format("snowflake")
.options(**options)
.option("query", query)
.save()```
退一步思考一下系统。
"Databricks群集"<--->;'雪花簇
您希望避免两个系统之间的通信过多,因为网络速度较慢。所以我建议:
- 将数据复制/插入snowflake并在那里进行转换合并
- 在databricks中准备数据,将结果复制到snowflake并在那里进行合并
所以你能在一条语句中从databricks表合并到Snowflake表中吗?我不知道。你应该这么做吗?可能不会。