如何在bigquery中使用Records、Struct和Array而不将它们压扁



我在Big查询中有一个非常嵌套的表。这只是其中的一部分:

[
{
"mode": "NULLABLE",
"name": "id",
"type": "INTEGER"
},

{
"fields": [
{
"fields": [
{
"mode": "NULLABLE",
"name": "Type",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "Term",
"type": "STRING"
},
{
"fields": [
{
"mode": "NULLABLE",
"name": "TenderID",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "CardType",
"type": "STRING"
],
"mode": "REPEATED",
"name": "EftInfo",
"type": "RECORD"
},
],
"mode": "REPEATED",
"name": "Trx",
"type": "RECORD"
}
],
"mode": "NULLABLE",
"name": "transactions",
"type": "RECORD"
}
]

我只想在每个字段中强制转换数据,但保留结构。我得到的是这个,但它使我的表变平了,或者更确切地说,RECORDS变得可以为null,而不是重复,我不知道如何将它们排列回来:

select 
"id",
ARRAY( SELECT AS STRUCT CAST(Trx.Type AS  INT64) Type,
CAST(Trx.Term AS  INT64) Term,
CAST(Trx.TrxNum AS  INT64) TrxNum,
ARRAY(SELECT AS STRUCT CAST(EftInfo.TenderID AS INT64 ) TenderID,
CAST(EftInfo.CardType AS INT64 ) CardType
)EftInfo) Trx ) transactions
from 
dataset.table, UNNEST(transactions.Trx)Trx, UNNEST(Trx.EftInfo) EftInfo

所以在尝试了以上所有组合后,我得到了正确的结果,我可以解释。满怀希望,这将为某人节省时间:

SELECT id,
---here just (), because this is structure which cannot have multiple values
(SELECT AS STRUCT 
CAST(transactions.Location AS  INT64) Location,
CAST(transactions.StoreID AS  INT64) StoreID,
---Here array, because this structure can have more entries
ARRAY(SELECT AS STRUCT 
CAST(TenderID AS INT64 ) TenderID,
CAST(CardType AS INT64 ) CardType
FROM UNNEST(Trx.EftInfo) EftInfo)  EftInfo)
FROM UNNEST(transactions.Trx)Trx)Trx
) transactions
from 
my_ds.my_table

通过这种方式,可以在bq允许的范围内深入构建其结构。控制正在发生的事情的一个好方法是json,可以在查询执行后看到它。

最新更新