BigQuery:将重复字段与普通字段一起过滤



我有下表:

row | cust_id | name
1   | 34      | word1 word2 someworkd  
|         | something word1
|         | blabla
2   | 35      | word1 word2 word3
|         | word4 word5

select*result JSON

[
{
"cust_id": "34",
"name": [
"word1 word2 someworkd",
"something word1",
"blabla"
]
},
{
"cust_id": "35",
"name": [
"word1 word2 word3",
"word4 word5",
"word1word5"
]
}
]

架构JSON:

[
{
"mode": "NULLABLE",
"name": "cust_id",
"type": "STRING"
},
{
"mode": "REPEATED",
"name": "name",
"type": "STRING"
}
]

我想根据cust_id和它们值上的重复字段进行筛选,所以使用以下查询:

SELECT * FROM `mytable` WHERE  cust_id='34' and name like'%word1%'

查询的预期输出:

row | cust_id | name
1   | 34      | word1 word2 someworkd  
|         | something word1

JSON格式的查询预期输出:

{
"cust_id": "34",
"name": [
"word1 word2 someworkd",
"something word1"
]
}
]

受此问题启发,BigQuery:使用标准SQL过滤重复字段我尝试这个查询:

SELECT cust_id, name   FROM `mytable`, UNNEST(name) AS name WITH OFFSET o
WHERE name like '%word1%' and cust_id='34'

哪个输出:

row | cust_id | name
1   | 34      | word1 word2 someworkd  
| 34      | something word1

输出JSON:

[
{
"cust_id": "34",
"name": "word1 word2 someworkd"
},
{
"cust_id": "34",
"name": "something word1"
}
]

输出结果和我想要的不一样,请问我该如何实现?

我会发布我所做的解决方案,它可以根据我的需要工作,可能对其他有用

SELECT
cust_id,
ARRAY(
SELECT
name
FROM
`mytable`,
UNNEST(name) AS name
WHERE
name LIKE '%word1%'
AND cust_id='34') AS name
FROM
`mytable`
WHERE
cust_id="34"

结果:

row | cust_id | name
1   | 34      | word1 word2 someworkd  
|         | something word1

JSON:

[
{
"cust_id": "34",
"name": [
"word1 word2 someworkd",
"something word1"
]
}
]

下面是BigQuery标准SQL

#standardSQL
SELECT * REPLACE(
ARRAY(SELECT * FROM t.name WHERE name LIKE'%word1%') 
AS name)
FROM `project.dataset.table` t
WHERE cust_id = 34

最新更新