如何合并两个json行与重复的id和唯一值在雪花?



我必须合并两个(或两个以上)JSON行具有相同的id,但这两个行都有唯一的值。如:

create or replace table source(v variant); 
INSERT INTO source SELECT parse_json('{
"pd": {
"extraction_date": "1644471240",
"product_id": "357946",
"retailerName": "retailer",
"productName":"product"
"unique1":"value"
}
}');
INSERT INTO source SELECT parse_json('{
"pd": {
"extraction_date": "1644471242",
"product_id": "357946",
"retailerName": "retailer2",
"productName":"product2",
"unique2":"value"
}
}');

我想要的输出是:

"pd": {
"extraction_date": "1644471240",
"product_id": "357946",
"retailerName": "retailer",
"productName":"product",
"unique1":"value",
"unique2":"value"
}
}');

我看了文档和StackOverflow,但没有解决方案适用于这种情况。什么是解决方案

像这样的JavaScript UDF将合并问题中的对象:

create or replace function merge_objects(X array)
returns variant
language javascript
as $$
merged = {}
X.forEach((item) => {
keys = Object.keys(item['pd']);
for (let i = 0; i < keys.length; i++) {
merged[keys[i]] = item['pd'][keys[i]];
}
});
return {'pd': merged};
$$

然后可以在SQL中使用该UDF来执行合并:

select count(*)
, merge_objects(array_agg(v) 
within group (order by v:pd.extraction_date desc))
from source
group by v:pd.product_id;

最新更新