按条件查询BigQuery中的数组列



我在Bigquery中有一个表,格式如下:

+------------+-----------------+------------+-----------------+---------------------------------+
| event_date | event_timestamp | event_name | event_params.key| event_params.value.string_value |
+------------+-----------------+------------+-----------------+---------------------------------+
| 20201110   | 2929929292      | my_event   | previous_page   | /some-page                      |
+------------+-----------------+------------+-----------------+---------------------------------+
|                                           | layer           | /some-page/layer                |
|                                           +-----------------+---------------------------------+
|                                           | session_id      | 99292                           |
|                                           +-----------------+---------------------------------+
|                                           | user._id        | 2929292                         |
+------------+-----------------+------------+-----------------+---------------------------------+
| 20201110   | 2882829292      | my_event   | previous_page   | /some-page                      |
+------------+-----------------+------------+-----------------+---------------------------------+
|                                           | layer           | /some-page/layer                |
|                                           +-----------------+---------------------------------+
|                                           | session_id      | 29292                           |
|                                           +-----------------+---------------------------------+
|                                           | user_id         | 229292                          |
+-------------------------------------------+-----------------+---------------------------------+

我想执行一个查询来获取event_params.value.string_value包含正则表达式/layer的所有行。

我已经试过了:

SELECT 
"event_params.value.string_value",  
FROM `my_project.my_dataset.my_events_20210110`,
UNNEST(event_params) AS event_param
WHERE event_param.key = 'layer' AND
REGEXP_CONTAINS(event_param.value.string_value, r'/layer')
LIMIT 100

但是我得到这个输出:

+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+

知道我做错了什么吗?

您正在选择一个字符串-您应该选择一个列。

另一个问题是,你交叉连接表和它的数组-有效地膨胀了表。

您的解决方案是在WHERE子句中使用子查询:

SELECT 
*  -- Not sure what you actually need from the table ...
FROM `my_project.my_dataset.my_events_20210110`
WHERE 
-- COUNT(*)>0 means "if you find more than zero" then return TRUE
(SELECT COUNT(*)>0 FROM UNNEST(event_params) AS event_param 
WHERE event_param.key = 'layer' AND
REGEXP_CONTAINS(event_param.value.string_value, r'/layer')
)
LIMIT 100

如果你真的想要数组中的值,你的快速解决方案是删除引号:

SELECT 
event_params.value.string_value  
FROM `my_project.my_dataset.my_events_20210110`,
UNNEST(event_params) AS event_param
WHERE event_param.key = 'layer' AND
REGEXP_CONTAINS(event_param.value.string_value, r'/layer')
LIMIT 100

相关内容

  • 没有找到相关文章

最新更新