在BigQuery中创建一个以实际列值为键的JSON列



有没有一种方法可以在BigQuery中创建一个以列值为键的JSON

我在表中有三列:

user_id (string) | category (string) | info (struct)
user_1, cat_A, info_1A
user_1, cat_B, info_1B
user_1, cat_C, info_1C
user_2, cat_A, info_2A
user_3, cat_Z, info_3Z
user_3, cat_B, info_3B
To abbreviate the values of the "info" column,
let's say that it is a struct of i.e. {'f': 2, 'c': 3, ...}

我想要这个输出,其中"features"列的是"category"列的实际值:

user_id (string) | features (struct/JSON)
user_1, {cat_A: info_1A, cat_B: info_1B, cat_C: info_1C, ...}
user_2, {cat_A: info_2A}
user_3, {cat_Z: info_3Z, cat_B: info_3B}

但是,我目前只能实现这种格式(为了更清晰,我对输出JSON进行了格式化(,其中是您在创建STRUCT时使用的预定义名称,即STRUCT(...) AS *key*:

[
{
"user_id": "user_1",
"features": [
{
"category": "cat_A",
"features": {
"f": 2,
"c": 3,
}
},
{
"category": "cat_B",
"features": {
"x": 7,
"z": 10,
}
},
...
}
...
]

通过使用以下查询:

SELECT
user_id,
ARRAY_AGG(
STRUCT(
category,
STRUCT(f, c, x, z) AS features -- the different features for each category
)
)
FROM ...
GROUP BY user_id

下面是BigQuery标准SQL

#standardSQL
SELECT user_id, '{' || STRING_AGG(category || ': ' || info, ', ') || '}' features
FROM `project.dataset.table`
GROUP BY user_id   

你可以使用你的问题中的样本数据进行测试,如下面的示例所示

#standardSQL
WITH `project.dataset.table` AS (
SELECT 'user_1' user_id, 'cat_A' category, 'info_1A' info UNION ALL
SELECT 'user_1', 'cat_B', 'info_1B' UNION ALL
SELECT 'user_1', 'cat_C', 'info_1C' UNION ALL
SELECT 'user_2', 'cat_A', 'info_2A' UNION ALL
SELECT 'user_3', 'cat_Z', 'info_3Z' UNION ALL
SELECT 'user_3', 'cat_B', 'info_3B' 
)
SELECT user_id, '{' || STRING_AGG(category || ': ' || info, ', ') || '}' features
FROM `project.dataset.table`
GROUP BY user_id

带输出

Row user_id features     
1   user_1  {cat_A: info_1A, cat_B: info_1B, cat_C: info_1C}     
2   user_2  {cat_A: info_2A}     
3   user_3  {cat_Z: info_3Z, cat_B: info_3B}