具有多个记录路径的json_normalize



我使用的是json_normalize文档中给出的例子,这里给出了pandas.json_normalize--pandas1.0.3文档,不幸的是,我无法粘贴实际的json,但这个例子有效。粘贴自文档:

data = [{'state': 'Florida',
'shortname': 'FL',
'info': {'governor': 'Rick Scott'},
'counties': [{'name': 'Dade', 'population': 12345},
{'name': 'Broward', 'population': 40000},
{'name': 'Palm Beach', 'population': 60000}]},
{'state': 'Ohio',
'shortname': 'OH',
'info': {'governor': 'John Kasich'},
'counties': [{'name': 'Summit', 'population': 1234},
{'name': 'Cuyahoga', 'population': 1337}]}]
result = json_normalize(data, 'counties', ['state', 'shortname',
['info', 'governor']])
result

name  population    state shortname info.governor
0        Dade       12345   Florida    FL    Rick Scott
1     Broward       40000   Florida    FL    Rick Scott
2  Palm Beach       60000   Florida    FL    Rick Scott
3      Summit        1234   Ohio       OH    John Kasich
4    Cuyahoga        1337   Ohio       OH    John Kasich

如果JSON是下面的那个,info是一个数组而不是dict:,会怎么样

data = [{'state': 'Florida',
'shortname': 'FL',
'info': [{'governor': 'Rick Scott'}, 
{'governor': 'Rick Scott 2'}],
'counties': [{'name': 'Dade', 'population': 12345},
{'name': 'Broward', 'population': 40000},
{'name': 'Palm Beach', 'population': 60000}]},
{'state': 'Ohio',
'shortname': 'OH',
'info': [{'governor': 'John Kasich'}, 
{'governor': 'John Kasich 2'}],
'counties': [{'name': 'Summit', 'population': 1234},
{'name': 'Cuyahoga', 'population': 1337}]}]

如何使用json_normalize获得以下输出:

name  population    state shortname info.governor
0        Dade       12345   Florida    FL    Rick Scott
1        Dade       12345   Florida    FL    Rick Scott 2
2     Broward       40000   Florida    FL    Rick Scott
3     Broward       40000   Florida    FL    Rick Scott 2
4  Palm Beach       60000   Florida    FL    Rick Scott
5  Palm Beach       60000   Florida    FL    Rick Scott 2
6      Summit        1234   Ohio       OH    John Kasich
7      Summit        1234   Ohio       OH    John Kasich 2    
8    Cuyahoga        1337   Ohio       OH    John Kasich
9    Cuyahoga        1337   Ohio       OH    John Kasich 2

或者,如果有其他方法,请告诉我。

json_normalize的设计是为了方便而非灵活性。它不能处理所有形式的JSON(JSON太灵活了,无法为其编写通用解析器(。

打两次json_normalize然后合并怎么样。这假设每个状态在JSON:中只出现一次

counties = json_normalize(data, 'counties', ['state', 'shortname'])
governors = json_normalize(data, 'info', ['state'])
result = counties.merge(governors, on='state')

最新更新