使用REGEX_EEXTRACT从重复项中提取一个值,但不提取另一个值



我在从大型文本日志中提取特定变量时遇到问题。

正常日志如下所示:

metadata {
unique_id: "88dvsq113-0dcf-410f-84fb-d342076def6f"
webhook_response_time: 155
intent_name: "Dogs are the best"
variable_one: "true"
variable_two: "false"
variable_three: "false"
}

我只想提取intent_name变量,所以我使用正则表达式:

SELECT REGEXP_EXTRACT(textPayload, r"intent_name:(.+)") AS intent_name FROM table1

拿出"狗是最好的"的价值观。现在,在日志中,有两个不同的部分包括短语"intent_name",所以这个正则表达式并不能满足我的需要

metadata {
intent_id: "a664f00f-8105-4e09-bc34-2836dbe89ee1"
webhook_response_time: 105
intent_name: "Dogs are the best"
execution_sequence {
intent_id: "e231c181-31d9-4bfa-b2d8-7a52314bc628"
intent_name: "Cats are the best"
variable_one: "true"
variable_two: "false"
variable_three: "false"
}

如何编写一个表达式,只提取第一个intent_name值"Dogs are best",而不是不在execution_sequence括号内的值?

JSON值会容易得多。但对于第二种日志格式,您可以执行以下操作:

select regexp_extract(textPayload, r"""intent_name: ("[^"]+")[sS]*execution_sequence""")
from (select '''metadata {
unique_id: "88dvsq113-0dcf-410f-84fb-d342076def6f"
webhook_response_time: 155
intent_name: "Dogs are the best"
variable_one: "true"
variable_two: "false"
variable_three: "false"
}''' as textPayload union all
SELECT '''metadata {
intent_id: "a664f00f-8105-4e09-bc34-2836dbe89ee1"
webhook_response_time: 105
intent_name: "Dogs are the best"
execution_sequence {
intent_id: "e231c181-31d9-4bfa-b2d8-7a52314bc628"
intent_name: "Cats are the best"
variable_one: "true"
variable_two: "false"
variable_three: "false"
}'''
) x

这对第一种格式不起作用,但如果需要同时支持case表达式,则可以使用这两种格式。

最新更新