如何在Hive中提取JSON值



我有一个JSON字符串,存储在数据库中对应于父ID的单个单元格

{"profileState":"ACTIVE","isDefault":"true","joinedOn":"2019-03-24T15:19:52.639Z","profileType":"ADULT","id":"abc","signupDeviceId":"1"}||{"profileState":"ACTIVE","isDefault":"true","joinedOn":"2021-09-05T07:47:00.245Z","imageId":"19","profileType":"KIDS","name":"Kids","id":"efg","signupDeviceId":"1"}

现在我想使用上面的JSON从中提取id。假设我们有这样的数据

Parent ID  |  Profile JSON
1          |  {profile_json} (see above string)

我希望输出像这样

Parent ID  |  ID
1          |  abc
1          |  efg

现在,我尝试了几次迭代来解决这个

第一种方法:

select
get_json_object(p.profile, '$$.id') as id,
test.parent_id

from (
select split(
regexp_replace(
regexp_extract(profiles, '^\[(.+)\]$$',1),
'\}\,\{', '\}\|\|\{'),
'\|\|') as profile_list,
parent_id ,

from source_table) test
lateral view explode(test.profile_list) p as profile
)

但是这返回id列为NULL值。我是不是漏掉了什么?

第二种方法:

with profiles as(
select        regexp_replace(
regexp_extract(profiles, '^\[(.+)\]$$',1),
'\}\,\{', '\}\|\|\{') as profile_list,
parent_id

from source_table
)      
SELECT
get_json_object (t1.profile_list,'$.id')
FROM profiles t1

第二种方法是根据上面的JSON字符串只返回第一个id (abc)。

我尝试在apache hive v4中复制这个

数据
+----------------------------------------------------+------------------+
|                    data                    | parent_id  |
+----------------------------------------------------+------------------+
| {"profileState":"ACTIVE","isDefault":"true","joinedOn":"2019-03-24T15:19:52.639Z","profileType":"ADULT","id":"abc","signupDeviceId":"1"}||{"profileState":"ACTIVE","isDefault":"true","joinedOn":"2021-09-05T07:47:00.245Z","imageId":"19","profileType":"KIDS","name":"Kids","id":"efg","signupDeviceId":"1"} | 1.0              |
+----------------------------------------------------+------------------+

Sql

select pid,get_json_object(expl_jid,'$.id') json_id from 
(select  parent_id pid,split(data,'\|\|') jid  from tabl1)a 
lateral view explode(jid) exp_tab as expl_jid;
+------+----------+
| pid  | json_id  |
+------+----------+
| 1.0  | abc      |
| 1.0  | efg      |
+------+----------+

解决这个问题。在第一种方法

中使用提取$
select
get_json_object(p.profile, '$.id') as id,
test.parent_id
from (
select split(
regexp_replace(
regexp_extract(profiles, '^\[(.+)\]$$',1),
'\}\,\{', '\}\|\|\{'),
'\|\|') as profile_list,
parent_id ,

from source_table) test
lateral view explode(test.profile_list) p as profile

)

最新更新