AWS Glue-压扁深度嵌套的JSON



我想知道是否有一种方法可以使用Glue ETL作业来压平深度嵌套的JSON?这里面有嵌套的数组。我试图在JSON上运行Glue爬网程序,它返回了一个只有1个字段PerPlayer的目录,该字段具有结构数据类型。在粘合ETL工作中,我应该使用目录,还是只是将JSON读取到dynamicframe中,并执行一些转换以使其变平?如果只有一个记录(没有数组(,我可以使用relationalize来压平,但我的输入在数组结构中有几个记录,并且一些记录中有一些嵌套的数组。

我是Glue ETL的新手,所以任何建议或建议都将不胜感激。

{
"PerPlayer": {
"requestNo": "REQ912",
"Batch_Number": "1",
"Total_No_Of_Batches": "1",
"player": [
{
"username": "user1",
"characteristics": {
"race": "Human",
"class": "Warlock",
"subclass": "Dawnblade",
"power": 300,
"playercountry": "USA"
},
"arsenal": [
{
"kinetic": {
"name": "Sweet Business",
"type": "Auto Rifle",
"power": 300,
"element": "Kinetic"
},
"energy": {
"name": "MIDA Mini-Tool",
"type": "Submachine Gun",
"power": 300,
"element": "Solar"
},
"power": {
"name": "Play of the Game",
"type": "Grenade Launcher",
"power": 300,
"element": "Arc"
}
},
{
"kinetic": {
"name": "Sweet Business1",
"type": "Auto Rifle1",
"power": 300,
"element": "Kinetic1"
},
"energy": {
"name": "MIDA Mini-Tool",
"type": "Submachine Gun",
"power": 300,
"element": "Solar1"
},
"power": {
"name": "Play of the Game1",
"type": "Grenade Launcher1",
"power": 300,
"element": "Arc1"
}
}
],
"armor": {
"head": "Eye of Another World",
"arms": "Philomath Gloves",
"chest": "Philomath Robes",
"leg": "Philomath Boots",
"classitem": "Philomath Bond"
},
"location": {
"map": "Titan",
"waypoint": "The Rig"
}
},
{
"username": "user2",
"characteristics": {
"race": "Alien",
"class": "Starwars",
"subclass": "Dawnblade",
"power": 300,
"playercountry": "USA"
},
"arsenal": {
"kinetic": {
"name": "salt Business",
"type": "Auto Rifle",
"power": 300,
"element": "Kinetic"
},
"energy": {
"name": "MIDA Mini-Tool",
"type": "Submachine Gun",
"power": 300,
"element": "Solar"
},
"power": {
"name": "Play of the Game",
"type": "Grenade Launcher",
"power": 400,
"element": "Arc"
}
},
"armor": {
"head": "Eye of Another World",
"arms": "Philomath Gloves",
"chest": "Philomath Robes",
"leg": "Philomath Boots",
"classitem": "Philomath Bond"
},
"location": {
"map": "Titan",
"waypoint": "The Rig"
}
}
]
}
}

不幸的是,Glue Crawlers不可能做到这一点,该服务只会创建看起来像数据的表,而不会更改数据——而且也没有将嵌套层次结构映射到serde级别的平面结构的Athena功能。

通过将数据转换为一个新的扁平数据集,您可能可以使用Glue ETL来实现这一点,但总的来说,在我看来,尝试Glue的人最终会遇到比他们解决的问题更多的问题。

你可以做的是使用爬行器为你创建的表,并在Athena中创建一个使层次结构变平的视图。有一个叫做UNNEST的操作符,它将数组元素提升为行。它可能看起来像这样:

SELECT
PerPlayer.requestNo,
PerPlayer.Batch_Number,
PerPlayer.Total_No_Of_Batches,
player.username,
player.characteristics.race,
player.characteristics.class,
-- and so on
FROM original_table, UNNEST (PerPlayer.player) AS t(player)

结果是,在每个原始行的player数组中,每个元素将有一行,并且您可以访问原始行和播放器元素中的列。AS t(player)语法只是意味着包含数组元素的虚拟表应该被称为t,并具有一个名为player的列。

Stack Overflow上还有很多关于UNNEST的其他问题,你也可以从中寻找灵感。

如果要对平面结构运行查询,可以从上面的查询创建一个视图,并对该视图运行查询。抛开性能不谈,你的数据就好像被压扁了一样。

性能将取决于许多细节,除非必要,否则不要进行优化。您可以使用上面的查询来使用CTAS创建新的平面数据集。

最新更新