查询AWS Athena中的嵌套JSON结构



我得到了以下带有嵌套结构的JSON文档格式

{
"id": "p-1234-2132321-213213213-12312",
"name": "athena to the rescue",
"groups": [
{
"strategy_group": "anyOf",
"conditions": [
{
"strategy_conditions": "anyOf",
"entries": [
{
"c_key": "service",
"C_operation": "isOneOf",
"C_value": "mambo,bambo,jumbo"
},
{
"c_key": "hostname",
"C_operation": "is",
"C_value": "lols"
}
]
}
]
}
],
"tags": [
"aaa",
"bbb",
"ccc"
]
}

我在Athena中创建了一个表,使用以下来支持它

CREATE EXTERNAL TABLE IF NOT EXISTS filters ( id string, name string, tags array<string>, groups array<struct<
strategy_group:string,
conditions:array<struct<
strategy_conditions:string,
entries: array<struct<
c_key:string,
c_operation:string,
c_value:string
>>
>>
>> ) row format serde 'org.openx.data.jsonserde.JsonSerDe' location 's3://filterios/policies/';

我目前的目标是根据条件条目列进行查询。我尝试过一些查询,但sql语言并不是我最大的专业;(

我现在得到了这个查询,它给了我条目

select cnds.entries from 
filters,
UNNEST(filters.groups) AS t(grps),
UNNEST(grps.conditions) AS t(cnds)

然而,由于这是一个复杂的数组,它让我头疼什么是正确的查询方式。

感谢任何提示!

谢谢R

我不确定我是否理解您的查询。看看下面这个例子,也许它对你有用。

select 
id, 
name, 
tags,
grps.strategy_group,
cnds.strategy_conditions,
enes.c_key,
enes.c_operation, 
enes.c_value 
from 
filters,
UNNEST(filters.groups) AS t(grps),
UNNEST(grps.conditions) AS t(cnds),
UNNEST(cnds.entries) AS t(enes)
where 
enes.c_key='service'

下面是我最近使用的一个可能有帮助的例子:

我的JSON:

{
"type": "FeatureCollection",
"features": [{
"first": "raj",
"geometry": {
"type": "Point",
"coordinates": [-117.06861096, 32.57889962]
},
"properties": "someprop"
}] 
}

创建的外部表:

CREATE EXTERNAL TABLE `jsondata`(
`type` string COMMENT 'from deserializer', 
`features` array<struct<type:string,geometry:struct<type:string,coordinates:array<string>>>> COMMENT 'from deserializer')
ROW FORMAT SERDE 
'org.openx.data.jsonserde.JsonSerDe' 
WITH SERDEPROPERTIES ( 
'paths'='features,type') 
STORED AS INPUTFORMAT 
'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://vicinitycheck/rawData/jsondata/'
TBLPROPERTIES (
'classification'='json')

查询数据:

SELECT type AS TypeEvent,
features[1].geometry.coordinates AS FeatherType
FROM test_vicinitycheck.jsondata
WHERE type = 'FeatureCollection'

test_vicinitycheck-是我在Athena中的数据库名称
jsondata-在Athena 中的表名称

如果有帮助的话,我在博客上记录了一些例子:http://weavetoconnect.com/aws-athena-and-nested-json/

相关内容

  • 没有找到相关文章

最新更新