查询以从Presto中的深嵌套JSON数组对象提取ID



我正在使用presto,并尝试从嵌套的JSON结构中从'source''='dd'中提取所有'id'。

{
  "results": [
    {
      "docs": [
        {
          "id": "apple1",
          "source": "dd"
        },
        {
          "id": "apple2",
          "source": "aa"
        },
        {
          "id": "apple3",
          "source": "dd"
        }
      ],
      "group": 99806
    }
  ]
}

期望将ID [Apple1,Apple3]提取到Presto的一列中我想知道在Presto查询中实现这一目标的正确方法是什么?

如果您的数据具有常规结构,如您发布的示例,您可以将解析值作为JSON的组合,将其施放为结构化的SQL类型(array/map/ap/row/row(以及使用数组处理功能来过滤,转换和提取所需的元素:

WITH data(value) AS (VALUES '{
  "results": [
    {
      "docs": [
        {
          "id": "apple1",
          "source": "dd"
        },
        {
          "id": "apple2",
          "source": "aa"
        },
        {
          "id": "apple3",
          "source": "dd"
        }
      ],
      "group": 99806
    }
  ]
}'),
parsed(value) AS (
  SELECT cast(json_parse(value) AS row(results array(row(docs array(row(id varchar, source varchar)), "group" bigint))))
  FROM data
)
SELECT 
  transform(                                        -- extract the id from the resulting docs
    filter(                                         -- filter docs with source = 'dd'
        flatten(                                    -- flatten all docs arrays into a single doc array
             transform(value.results, r -> r.docs)  -- extract the docs arrays from the result array
        ),
        doc -> doc.source = 'dd'),
    doc -> doc.id)
FROM parsed

上面的查询产生:

      _col0
------------------
 [apple1, apple3]
(1 row)

相关内容

  • 没有找到相关文章

最新更新