我得到了以下带有嵌套结构的JSON文档格式
{
"id": "p-1234-2132321-213213213-12312",
"name": "athena to the rescue",
"groups": [
{
"strategy_group": "anyOf",
"conditions": [
{
"strategy_conditions": "anyOf",
"entries": [
{
"c_key": "service",
"C_operation": "isOneOf",
"C_value": "mambo,bambo,jumbo"
},
{
"c_key": "hostname",
"C_operation": "is",
"C_value": "lols"
}
]
}
]
}
],
"tags": [
"aaa",
"bbb",
"ccc"
]
}
我在Athena中创建了一个表,使用以下来支持它
CREATE EXTERNAL TABLE IF NOT EXISTS filters ( id string, name string, tags array<string>, groups array<struct<
strategy_group:string,
conditions:array<struct<
strategy_conditions:string,
entries: array<struct<
c_key:string,
c_operation:string,
c_value:string
>>
>>
>> ) row format serde 'org.openx.data.jsonserde.JsonSerDe' location 's3://filterios/policies/';
我目前的目标是根据条件条目列进行查询。我尝试过一些查询,但sql语言并不是我最大的专业;(
我现在得到了这个查询,它给了我条目
select cnds.entries from
filters,
UNNEST(filters.groups) AS t(grps),
UNNEST(grps.conditions) AS t(cnds)
然而,由于这是一个复杂的数组,它让我头疼什么是正确的查询方式。
感谢任何提示!
谢谢R
我不确定我是否理解您的查询。看看下面这个例子,也许它对你有用。
select
id,
name,
tags,
grps.strategy_group,
cnds.strategy_conditions,
enes.c_key,
enes.c_operation,
enes.c_value
from
filters,
UNNEST(filters.groups) AS t(grps),
UNNEST(grps.conditions) AS t(cnds),
UNNEST(cnds.entries) AS t(enes)
where
enes.c_key='service'
下面是我最近使用的一个可能有帮助的例子:
我的JSON:
{
"type": "FeatureCollection",
"features": [{
"first": "raj",
"geometry": {
"type": "Point",
"coordinates": [-117.06861096, 32.57889962]
},
"properties": "someprop"
}]
}
创建的外部表:
CREATE EXTERNAL TABLE `jsondata`(
`type` string COMMENT 'from deserializer',
`features` array<struct<type:string,geometry:struct<type:string,coordinates:array<string>>>> COMMENT 'from deserializer')
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'paths'='features,type')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://vicinitycheck/rawData/jsondata/'
TBLPROPERTIES (
'classification'='json')
查询数据:
SELECT type AS TypeEvent,
features[1].geometry.coordinates AS FeatherType
FROM test_vicinitycheck.jsondata
WHERE type = 'FeatureCollection'
test_vicinitycheck-是我在Athena中的数据库名称
jsondata-在Athena 中的表名称
如果有帮助的话,我在博客上记录了一些例子:http://weavetoconnect.com/aws-athena-and-nested-json/