我在Bigquery中有一个表,格式如下:
+------------+-----------------+------------+-----------------+---------------------------------+
| event_date | event_timestamp | event_name | event_params.key| event_params.value.string_value |
+------------+-----------------+------------+-----------------+---------------------------------+
| 20201110 | 2929929292 | my_event | previous_page | /some-page |
+------------+-----------------+------------+-----------------+---------------------------------+
| | layer | /some-page/layer |
| +-----------------+---------------------------------+
| | session_id | 99292 |
| +-----------------+---------------------------------+
| | user._id | 2929292 |
+------------+-----------------+------------+-----------------+---------------------------------+
| 20201110 | 2882829292 | my_event | previous_page | /some-page |
+------------+-----------------+------------+-----------------+---------------------------------+
| | layer | /some-page/layer |
| +-----------------+---------------------------------+
| | session_id | 29292 |
| +-----------------+---------------------------------+
| | user_id | 229292 |
+-------------------------------------------+-----------------+---------------------------------+
我想执行一个查询来获取event_params.value.string_value
包含正则表达式/layer
的所有行。
我已经试过了:
SELECT
"event_params.value.string_value",
FROM `my_project.my_dataset.my_events_20210110`,
UNNEST(event_params) AS event_param
WHERE event_param.key = 'layer' AND
REGEXP_CONTAINS(event_param.value.string_value, r'/layer')
LIMIT 100
但是我得到这个输出:
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
知道我做错了什么吗?
您正在选择一个字符串-您应该选择一个列。
另一个问题是,你交叉连接表和它的数组-有效地膨胀了表。
您的解决方案是在WHERE
子句中使用子查询:
SELECT
* -- Not sure what you actually need from the table ...
FROM `my_project.my_dataset.my_events_20210110`
WHERE
-- COUNT(*)>0 means "if you find more than zero" then return TRUE
(SELECT COUNT(*)>0 FROM UNNEST(event_params) AS event_param
WHERE event_param.key = 'layer' AND
REGEXP_CONTAINS(event_param.value.string_value, r'/layer')
)
LIMIT 100
如果你真的想要数组中的值,你的快速解决方案是删除引号:
SELECT
event_params.value.string_value
FROM `my_project.my_dataset.my_events_20210110`,
UNNEST(event_params) AS event_param
WHERE event_param.key = 'layer' AND
REGEXP_CONTAINS(event_param.value.string_value, r'/layer')
LIMIT 100