在雪花中合并变量行



我在snowflake中有一个变量数据类型的表结构,如下所示,您可以看到单个ID具有多个变量对象。

+-----+--------------------------+
| ID  | STATE_INFO               |
|-----+--------------------------|
| IND | {                        |
|     |   "population": "1000k", |
|     |   "state": "KA"          |
|     | }                        |
| IND | {                        |
|     |   "population": "2000k", |
|     |   "state": "AP"          |
|     | }                        |
| IND | {                        |
|     |   "population": "3000K", |
|     |   "state": "TN"          |
|     | }                        |
| US  | {                        |
|     |   "population": "100k",  |
|     |   "state": "Texas"       |
|     | }                        |
| US  | {                        |
|     |   "population": "200k",  |
|     |   "state": "Florida"     |
|     | }                        |
| US  | {                        |
|     |   "population": "300K",  |
|     |   "state": "Iowa"        |
|     | }                        |
+-----+--------------------------+

我想通过将行合并到一个数组或字典对象中,将这些变量对象合并为一个对象,如下所示

+-----+---------------------------+
| ID  | STATE_INFO                |
|-----+---------------------------|
| IND | [{                        |
|     |   "population": "1000k",  |
|     |   "state": "KA"           |
|     | },                        |
|     | {                         |
|     |   "population": "2000k",  |
|     |   "state": "AP"           |
|     | },                        |
|     | {                         |
|     |   "population": "3000K",  |
|     |   "state": "TN"           |
|     | }]                        |
| US  | [{                        |
|     |   "population": "100k",   |
|     |   "state": "Texas"        |
|     | },                        |
|     | {                         |
|     |   "population": "200k",   |
|     |   "state": "Florida"      |
|     | },                        |
|     | {                         |
|     |   "population": "300K",   |
|     |   "state": "Iowa"         |
|     | }]                        |
+-----+---------------------------+

就像在SQL术语中,我们可以在SQL语句

下面写
Select id,merge(STATE_INFO) from table group by id;

就像Mike说的ARRAY_AGG函数是你需要的,它在一个变量列上工作

select id, array_agg(STATE_INFO) within group (order by id) STATE_INFO
from table
group by 1
order by 1

为数据使用CTE:

With data(id, state_info) as (
select column1, parse_json(column2)
from values
('IND', '{ "population": "1000k", "state": "KA" }'),                        
('IND', '{ "population": "2000k", "state": "AP" }'),                        
('IND', '{ "population": "3000K", "state": "TN" }'),                                
('US', '{ "population": "100k", "state": "Texas" }'),                         
('US', '{ "population": "200k", "state": "Florida" }'),                         
('US', '{ "population": "300K", "state": "Iowa" }')                     
)

这段代码与demircioglu的答案几乎完全相同,但没有对数组内容进行排序。

select id, array_agg(state_info) as stateinfo
from data
group by 1;

由于输入的顺序仍然是有序的。但它实际上是随机的,这取决于你是否需要数据排序:

<表类>IDSTATEINFOUS[{"population";;100k";;state";;}, {"population";200k";state";}, {"population";300 ";state";}]IND[{"population";;1000k";;state";}, {"population";2000 ";state";}, {"population";3000K";state";}]