我试图从下面的表结构中提取以srsa
开头的id
id reason_text_field
34394 {"initial_customer":"sda_WWyfr4AXY1fIAS", customer_result":"srsa_CAkAaAvNKL2OSD"}
,以便得到以下输出:
id srsa_id
34394 srsa_CAkAaAvNKL2OSD
但是当我使用下面的SparkSQL函数
REGEXP_EXTRACT(reason_text_field, 'srsa[^"]*') as srsa_id
我得到这个错误:
. lang。IndexOutOfBoundsException: No group
您需要指定要捕获的组。试试这个:
SELECT id,
REGEXP_EXTRACT(reason_text_field, '"(srsa[^"]*)"', 1) as srsa_id
-- or REGEXP_EXTRACT(reason_text_field, 'srsa[^"]*', 0) as srsa_id
FROM tb
请注意,您也可以使用from_json
将文本列reason_text_field
转换为映射或结构体,然后提取字段customer_result
:
SELECT id,
from_json(reason_text_field, 'map<string,string>')['customer_result'] as srsa_id
FROM tb