BigQuery:使用交叉引用查询重复字段



我继承了一个具有某种特殊模式的 BigQuery 表:

[
{ "name":"hardware_id", "type":"STRING", "mode":"NULLABLE" },
{ "name":"manufacturer", "type":"STRING", "mode":"NULLABLE" },
{ "name":"model", "type":"STRING", "mode":"NULLABLE" },
{ "fields":[
{ "name":"brand", "type":"STRING", "mode":"REPEATED" },
{ "name":"model_name", "type":"STRING", "mode":"NULLABLE" }
], "name":"components", "type":"RECORD", "mode":"REPEATED" },
{ "name":"ram", "type":"INTEGER", "mode":"NULLABLE" },
{ "name":"hdd", "type":"INTEGER", "mode":"NULLABLE" }
]

数据的结构如下:

hw_id | manufacturer | model | components.type | components.model_name | ram |  hdd
------+--------------+-------+-----------------+-----------------------+-----+-----
1 |       Lenovo |   ABX |             GPU |           Radeon 5500 |  16 | 1000
|              |       |             CPU |               Core i7 |     |
|              |       | SCSI Controller |          Adaptec 2940 |     |
------+--------------+-------+-----------------+-----------------------+-----+-----
2 |         Dell |   ZXV |             CPU |               Core i7 |   4 |  500
|              |       |             GPU |               GeForce |     |
|              |       |           Sound |          SoundBlaster |     |
------+--------------+-------+-----------------+-----------------------+-----+-----
3 |          IBM |  PS/2 |             CPU |                  i386 |   1 |  100
|              |       |           Sound |          SoundBlaster |     |
|              |       |             GPU |               GeForce |     |

我想一次查询多个组件,例如查找具有Core i7 CPU和SoundBlaster声卡的所有硬件。不幸的是,"组件"字段的顺序不一致,"model_name"可能不明确,所以我也需要查询相应的"品牌"字段。

我能够为单个组件创建查询,但还不能同时为多个组件创建查询。你能提示我正确的方向吗?

下面是 BigQuery Standard SQL

#standardSQL
SELECT *
FROM `project.dataset.table`
WHERE 2 = 
(SELECT COUNT(1) 
FROM UNNEST(components) 
WHERE (type, model_name) IN (
('Sound', 'SoundBlaster'), ('CPU', 'Core i7')
)
)

您可以使用问题中的虚拟数据进行测试,玩上面,如下所示

#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 hw_id, 'Lenovo' manufacturer, 'ABX' model, [STRUCT<type STRING, model_name STRING>('GPU', 'Radeon 5500'), ('CPU', 'Core i7'), ('SCSI Controller', 'Adaptec 2940')] components, 16 ram, 1000 hdd UNION ALL
SELECT 2, 'Dell', 'ZXV', [('CPU', 'Core i7'), ('GPU', 'GeForce'), ('Sound', 'SoundBlaster')], 4, 500 UNION ALL
SELECT 3, 'IBM', 'PS/2', [('CPU', 'i386'), ('Sound', 'SoundBlaster'), ('GPU', 'GeForce')], 1, 100 
)
SELECT *
FROM `project.dataset.table`
WHERE 2 = 
(SELECT COUNT(1) 
FROM UNNEST(components) 
WHERE (type, model_name) IN (
('Sound', 'SoundBlaster'), ('CPU', 'Core i7')
)
)

所以结果将是

hw_id | manufacturer | model | components.type | components.model_name | ram |  hdd
------+--------------+-------+-----------------+-----------------------+-----+-----
2 |         Dell |   ZXV |             CPU |               Core i7 |   4 |  500
|              |       |             GPU |               GeForce |     |
|              |       |           Sound |          SoundBlaster |     |

最新更新