我有一个由这个JSON表示的BigQuery表(重复的记录(
{
"createdBy": [
"foo",
"foo"
],
"fileName": [
"bar1",
"bar2"
]
}
我需要转换为重复记录
[
{
"createdBy": "foo",
"fileName": "bar1"
},
{
"createdBy": "foo",
"fileName": "bar2"
}
]
若要进行此转换,请对每个数组使用索引0,并创建第一个对象,对第二个对象使用索引1。。。
我使用UDF执行了这种转换,但问题是由于BigQuery的限制,我无法保存执行这种转换的VIEW:
No support for CREATE TEMPORARY FUNCTION statements inside views
按照完整的语句生成示例表和函数
CREATE TEMP FUNCTION filesObjectArrayToArrayObject(filesJson STRING)
RETURNS ARRAY<STRUCT<createdBy STRING, fileName STRING>>
LANGUAGE js AS """
function filesObjectArrayToArrayObject_execute(files) {
var createdBy = files["createdBy"];
var fileName = files["fileName"];
var output = [];
for(var i=0 ; i<createdBy.length ; i++) {
output.push({ "createdBy" : createdBy[i], "fileName" : fileName[i] });
}
return output;
}
return filesObjectArrayToArrayObject_execute(JSON.parse(filesJson));
""";
WITH sample_table AS (
SELECT STRUCT<createdBy ARRAY<STRING>, fileName ARRAY<STRING>>(
["foo", "foo"],
["bar1", "bar2"]
) AS files
)
SELECT
files AS filesOriginal,
filesObjectArrayToArrayObject(TO_JSON_STRING(files)) AS filesConverted
FROM sample_table
有没有一种方法可以使用本机BigQuery语句执行相同类型的任务?
请注意:
- 真实数据有2个以上的键,但这些键在名称中是固定的
- 数组的长度不是固定的,可以是0、1、10、20
下面是BigQuery标准SQL
#standardSQL
WITH sample_table AS (
SELECT STRUCT<createdBy ARRAY<STRING>, fileName ARRAY<STRING>>(
["foo", "foo"],
["bar1", "bar2"]
) AS files
)
SELECT
ARRAY(
SELECT STRUCT(createdBy, fileName)
FROM t.files.createdBy AS createdBy WITH OFFSET
JOIN t.files.fileName AS fileName WITH OFFSET
USING(OFFSET)
) files
FROM `sample_table` t
带输出
Row files.createdBy files.fileName
1 foo bar1
foo bar2