Presto unnest/map complex json

我已经在堆栈上探索了几个小时的其他预运行/映射/铸造解决方案的示例，但似乎找不到适用于我的数据的解决方案。

这是我的数据样本：

with test_data (id, messy_json) AS (
VALUES ('TEST_A', JSON '{"issue":[],"problem":[{"category":"math","id":2,"name":"subtraction"},{"category":"math","id":3,"name":"division"},{"category":"english","id":25,"name":"verbs"},{"category":"english","id":27,"name":"grammar"},{"category":"language","id":1,"name":"grammar"}],"version":4}'),
('TEST_B', JSON '{"problem":[],"version":4}'),
('TEST_C', JSON '{"version": 4, "problem": [], "issue": [null, null, null, null, null, null, null, null, null, null, null]}')
),

JSON列是半非结构化的，可以容纳多个lvl/并不总是像其他行一样具有每个键：值对。

我在尝试解决方案，比如：

with test_data AS (
select id,
messy_json        
from larger_tbl),

select 
id as id,
json_extract_scalar(test_data, '$.version') as lvl1_version
json_extract_scalar(lvl2, '$.problem') as lvl2_id
from test
LEFT JOIN UNNEST(CAST(json_parse(messy_json) AS array(json))) AS x(lvl1) ON TRUE
LEFT JOIN UNNEST(CAST(json_extract(lvl1, '$.problem') AS array(json))) AS y(lvl2) ON TRUE

这让我出现了铸造错误等。我已经尝试了的一些变体

unnest(cast(json_col as map(varchar, map(varchar,varchar)) options too.

我的目标是用保留的ID和保留在长数据集中的所有键/多lvl键来分解整个数据集。我感谢任何意见/指导，谢谢！

老实说，基于提供的样本数据和声明的目标，您当前的查询没有多大意义。

UNNEST与cross join一起使用，而不是与left join一起使用(我使用简洁的语法完全跳过关键字(
不需要在测试数据中使用json_parse(messy_json)，因为它已经是json了(尽管我假设在实际数据中它只是一个varchar字段，但应该保留它(
根json是一个json对象，所以它不能被强制转换为数组(也没有必要(
json_extract_scalar(lvl2, '$.problem') as lvl2_id中的problem应更改为id，因为problem阵列已被取消测试
若要处理问题数组为空的情况，可以使用以下方法：用一个包含1个元素的伪数组来取消对多个数组的测试

-- sample data
with test_data (id, messy_json) AS (
VALUES ('TEST_A', JSON '{"issue":[],"problem":[{"category":"math","id":2,"name":"subtraction"},{"category":"math","id":3,"name":"division"},{"category":"english","id":25,"name":"verbs"},{"category":"english","id":27,"name":"grammar"},{"category":"language","id":1,"name":"grammar"}],"version":4}'),
('TEST_B', JSON '{"problem":[],"version":4}'),
('TEST_C', JSON '{"version": 4, "problem": [], "issue": [null, null, null, null, null, null, null, null, null, null, null]}')
)
-- query
select id
, json_extract_scalar(messy_json, '$.version') as lvl1_version
, json_extract_scalar(lvl2, '$.id')            as lvl2_id
from test_data
, UNNEST(CAST(json_extract(messy_json, '$.problem') 
AS array(json)), array[1]) AS y(lvl2, ignored);

输出：

id	lvl1_version	lvl2_id
TEST_A	4	2
TEST_A	4	3
TEST_A	4	25
TEST_A	4	27
TEST_A	4	1
TEST_B	4	NULL
TEST_C	4	NULL

相关内容

最新更新

热门标签：