BigQuery:转换重复记录中重复的记录



我有一个由这个JSON表示的BigQuery表(重复的记录(

{
"createdBy": [
"foo",
"foo"
],
"fileName": [
"bar1",
"bar2"
]
}

我需要转换为重复记录

[
{
"createdBy": "foo",
"fileName": "bar1"
},
{
"createdBy": "foo",
"fileName": "bar2"
}
]

若要进行此转换,请对每个数组使用索引0,并创建第一个对象,对第二个对象使用索引1。。。

我使用UDF执行了这种转换,但问题是由于BigQuery的限制,我无法保存执行这种转换的VIEW:

No support for CREATE TEMPORARY FUNCTION statements inside views

按照完整的语句生成示例表和函数

CREATE TEMP FUNCTION filesObjectArrayToArrayObject(filesJson STRING)
RETURNS ARRAY<STRUCT<createdBy STRING, fileName STRING>>
LANGUAGE js AS """
function filesObjectArrayToArrayObject_execute(files) {   
var createdBy = files["createdBy"];
var fileName = files["fileName"];
var output = [];
for(var i=0 ; i<createdBy.length ; i++) {
output.push({ "createdBy" : createdBy[i], "fileName" : fileName[i] });
}
return output;
}
return filesObjectArrayToArrayObject_execute(JSON.parse(filesJson));
""";
WITH sample_table AS (
SELECT STRUCT<createdBy ARRAY<STRING>, fileName ARRAY<STRING>>(
["foo", "foo"],
["bar1", "bar2"]
) AS files
)
SELECT
files AS filesOriginal,
filesObjectArrayToArrayObject(TO_JSON_STRING(files)) AS filesConverted
FROM sample_table

有没有一种方法可以使用本机BigQuery语句执行相同类型的任务?

请注意:

  • 真实数据有2个以上的键,但这些键在名称中是固定的
  • 数组的长度不是固定的,可以是0、1、10、20

下面是BigQuery标准SQL

#standardSQL
WITH sample_table AS (
SELECT STRUCT<createdBy ARRAY<STRING>, fileName ARRAY<STRING>>(
["foo", "foo"],
["bar1", "bar2"]
) AS files
)
SELECT 
ARRAY(
SELECT STRUCT(createdBy, fileName) 
FROM t.files.createdBy AS createdBy WITH OFFSET
JOIN t.files.fileName AS fileName WITH OFFSET
USING(OFFSET)
) files
FROM `sample_table` t  

带输出

Row files.createdBy files.fileName   
1   foo             bar1     
foo             bar2       

相关内容

  • 没有找到相关文章