如何解析snowflake中的json以获取json中基于某个字段的计数


[
{
"type": "ENTITLEMENT",
},
{
"type": "ENTITLEMENT",
},
{
"type": "ROLE"
},
{
"type": "ACCESS_PROFILE"
}
]

我在snowflake中有一个具有上述json值的列。我试图得到一个特定类型的计数。例如,我想要type=ENTITLEMENT的计数。这应该是2.

我可以把这些数据摊平。但是这将为列产生大量重复的数据(我有一个包含大量列的表)。

寻找解析json的方法。

我尝试过的事情。

  1. 编写一个java udf来解析的内容示例
REATE OR REPLACE FUNCTION IDN_DATA.entitlement_counter(access array)
RETURNS NUMBER
LANGUAGE java handler = 'JsonCounterWithFilter.entitlement_counter'
as
$$
public class JsonCounterWithFilter {
public int entitlement_counter(String[] access) {
int counter = 0;
for(String acc :access) {
if(acc.contains("ENTITLEMENT")) {
counter++;
}
}
return counter;
}
}
$$;

这不起作用,因为它会在任何地方寻找entitlement这个词,而不仅仅是在type字段中。我不能访问jackson库来解析这个json。

一个简短的JS UDF可以做这个计数:

create or replace function count_object_in_array(A array, T string, V string)
returns string
language javascript
as
$$
return A.reduce((count, x) => count + (x[T] == V?1:0), 0)
$$;

例如:

create or replace temp table stst as
select parse_json('[
{
"type": "ENTITLEMENT",
},
{
"type": "ENTITLEMENT",
},
{
"type": "ROLE"
},
{
"type": "ACCESS_PROFILE"
}
]') a;

select *, count_object_in_array(a, 'type', 'ENTITLEMENT')
from stst;

这是一个纯SQL方法:

create or replace temp table T1 as
select parse_json($$

[
{
"type": "ENTITLEMENT",
},
{
"type": "ENTITLEMENT",
},
{
"type": "ROLE"
},
{
"type": "ACCESS_PROFILE"
}
]

$$) COL1;

select  VALUE:type::string as TYPE
,count(*) as CT
from t1, table(flatten(t1.col1))
group by TYPE

横向扁平化不应该产生dup(至少不是基于您的输入)。您可以在本机json命令中进行聚合。看看这是否适用于您的数据

WITH JSON_DATA AS(
SELECT 
PARSE_JSON('[
{
"type": "ENTITLEMENT",
},
{
"type": "ENTITLEMENT",
},
{
"type": "ROLE"
},
{
"type": "ACCESS_PROFILE"
}
]' ) as json)
SELECT  value:type::string as type, count(*) FROM JSON_DATA, lateral flatten(input => json) 
GROUP BY type;

结果:

<表类>类型COUNT (*)tbody><<tr>权利21ACCESS_PROFILE1

所以如果我们有&;more&;您所暗示的数据比单个演示行要多:

with data as (
select column1 as id, parse_json(column2) as json from values 
('a', '[ {"type":"ENTITLEMENT"},{"type":"ENTITLEMENT"},{"type":"ROLE"},{"type":"ACCESS_PROFILE"}]'),
('b', '[ {"type":"ENTITLEMENT"},{"type":"ROLE"},{"type":"ACCESS_PROFILE"}]'),
('c', '[ {"type":"ENTITLEMENT"},{"type":"ENTITLEMENT"},{"type":"ENTITLEMENT"},{"type":"ROLE"},{"type":"ACCESS_PROFILE"}]')
)

然后把它压平,就像它只是一大块代码一样,你就得到了" mixed ";价值观:

SELECT  
j.value:type::string as type, count(*) 
FROM data as d, lateral flatten(input => d.json) j
GROUP BY type;
<表类>类型COUNT (*)tbody><<tr>权利63ACCESS_PROFILE3

最新更新