我在Athena中有一个表,其中的数据摄取是通过MongoDB完成的,其中表的一列是JSON数组。
只有表列是一个字符串,我不知道如何运行查询。
+--------+---------------------------------------------------------------------------------------------------------------------------------------+
| client | information |
+--------+---------------------------------------------------------------------------------------------------------------------------------------+
| 1 | [ { "ProductId" : { "$binary" : "7KgRQKabqkuxE+1pSw9b7Q==", "$type" : "03" }, "Risk" : "0", "Due_data" : { "$date" : 1634860800000 } ]|
+--------+---------------------------------------------------------------------------------------------------------------------------------------+
我想知道我是否可以像这个一样运行查询以离开表
+--------+-------------------------------------------------------------------+
| client | ProductId | Risk | Due_data |
+--------+-------------------------------------------------------------------+
| 1 | 4011A8EC-9BA6-4BAA-B113-ED694B0F5BED | 0 | 2021-12-08 00:00:00 |
+--------+-------------------------------------------------------------------+
感谢
使用数组并不容易,因为presto并不完全支持jsonpath。如果保证数组中只有一个元素,则可以执行类似json_extract_scalar(json, '$[0].ProductId["$binary"]')
的操作来提取字段,否则可以将json转换为json数组并使用它:
-- sample data
WITH dataset (client, information) AS (
VALUES (1, '[ { "ProductId" : { "$binary" : "7KgRQKabqkuxE+1pSw9b7Q==", "$type" : "03" }, "Risk" : "0", "Due_data" : { "$date" : 1634860800000 }} ]')
)
--query
SELECT client,
r.product_id,
r.risk,
r.ts
FROM dataset
CROSS JOIN UNNEST(
transform(
cast(json_parse(information) as array(json)), -- parse json and treat as array of json
json->cast( -- transform concrete json to row
ROW(
json_extract_scalar(json, '$.ProductId["$binary"]'),
json_extract_scalar(json, '$.Risk'),
from_unixtime(
cast(
json_extract_scalar(json, '$.Due_data["$date"]') AS BIGINT
) / 1000 -- transform timestamp to date
)
) as ROW(product_id VARCHAR, risk INT, ts TIMESTAMP)
)
)
) as t(r)
输出:
客户端 | product_id | 风险ts | |
---|---|---|---|
1 | 7KgRQKabqkuxE+1pSw9b7Q= | 0 | 2021-10-22 00:00:00.000 |