使用Python拆分JSON文件的最简单方法



我正在做一个从2015年到2020年的世界幸福报告的交互式可视化。数据被分成6个csv文件。使用pandas,我成功地清理了数据,并将它们连接到一个大的JSON文件中,格式如下:

[
  {
    "Country": "Switzerland",
    "Year": 2015,
    "Happiness Rank": 1,
    "Happiness Score": 7.587000000000001,
  },
  {
    "Country": "Iceland",
    "Year": 2015,
    "Happiness Rank": 2,
    "Happiness Score": 7.561,
  },
  {
    "Country": "Switzerland",
    "Year": 2016,
    "Happiness Rank": 2,
    "Happiness Score": 7.5089999999999995,
  },
  {
    "Country": "Iceland",
    "Year": 2016,
    "Happiness Rank": 3,
    "Happiness Score": 7.501,
  },
  {
    "Country": "Switzerland",
    "Year": 2017,
    "Happiness Rank": 3,
    "Happiness Score": 7.49399995803833,
  },
  {
    "Country": "Iceland",
    "Year": 2017,
    "Happiness Rank": 1,
    "Happiness Score": 7.801,
  }
]

现在,我想以编程方式格式化JSON文件,使其具有以下格式:

{
    "2015": {
        "Switzerland": {
            "Happiness Rank": 1,
            "Happiness Score": 7.587000000000001
        },
        "Iceland": {
            "Happiness Rank": 2,
            "Happiness Score": 7.561
        }
    },
    "2016": {
        "Switzerland": {
            "Happiness Rank": 2,
            "Happiness Score": 7.5089999999999995
        },
        "Iceland": {
            "Happiness Rank": 3,
            "Happiness Score": 7.501
        }
    },
    "2017": {
        "Switzerland": {
            "Happiness Rank": 3,
            "Happiness Score": 7.49399995803833
        },
        "Iceland": {
            "Happiness Rank": 1,
            "Happiness Score": 7.801
        }
    }
}

必须通过编程完成,因为有超过900个不同的(国家、年份)对。我想要这种格式的JSON,因为它使JSON文件更具可读性,并且更容易选择适当的数据。如果我想要冰岛在2015年的排名,我可以做data[2015]["Iceland"]["Happiness Rank"]

有谁知道在Python中最简单/最方便的方法吗?

如果data是您的原始字典列表:

def by_year(data):
    from itertools import groupby
    from operator import itemgetter
    retain_keys = ("Happiness Rank", "Happiness Score")
    for year, group in groupby(data, key=itemgetter("Year")):
        as_tpl = tuple(group)
        yield str(year), dict(zip(map(itemgetter("Country"), as_tpl), [{k: d[k] for k in retain_keys} for d in as_tpl]))

print(dict(by_year(data)))

输出:

{'2015': {'Switzerland': {'Happiness Rank': 1, 'Happiness Score': 7.587000000000001}, 'Iceland': {'Happiness Rank': 2, 'Happiness Score': 7.561}}, '2016': {'Switzerland': {'Happiness Rank': 2, 'Happiness Score': 7.5089999999999995}, 'Iceland': {'Happiness Rank': 3, 'Happiness Score': 7.501}}, '2017': {'Switzerland': {'Happiness Rank': 3, 'Happiness Score': 7.49399995803833}, 'Iceland': {'Happiness Rank': 1, 'Happiness Score': 7.801}}}
>>> 

这里假设data中的字典已经按年分组在一起了。

我假设您有创建JSON的原始pandas数据框。对于熊猫,你可以做df = df.groupby(['Year', 'Country'])。然后,您可以按照pandas groupby to nested json中的过程将其转换为json。

您可能会发现来自itertools模块的groupby很有用。我可以用

做到这一点
import itertools
groups = itertools.groupby(data, lambda x: x["Year"])
newdict = {str(year): {entry["Country"]:entry for entry in group} for year, group in groups}

其中data是您给出的示例形式的数据

它将保留字典中的原始字段,但可以通过这种方式轻松删除

for countries in newdict.values():
    for c in countries.values():
        del c["Year"]
        del c["Country"]

最新更新