将 JSON 导入数据帧并进行规范化



我有以下 json 文档,我想将其导入到数据帧中:

{
"agents": [
{
"core_build": "17",
"core_version": "7.1.1",
"distro": "win-x86-64",
"groups": [
{
"id": 101819,
"name": "O Laptops"
}
],
"id": 2198802,
"ip": "x.x.x.x",
"last_connect": 1539962159,
"last_scanned": 1539373347,
"linked_on": 1534964847,
"name": "x1x1x1x1",
"platform": "WINDOWS",
"plugin_feed_id": "201810182051",
"status": "on",
"uuid": "ca8b941a-80cd-4c1c-8044-760e69781eb7"
},
{
"core_build": "17",
"core_version": "7.1.1",
"distro": "win-x86-64",
"groups": [
{
"id": 101839,
"name": "G Personal"
},
{
"id": 102037,
"name": "W6"
},
{
"id": 102049,
"name": "MS8"
}
],
"id": 2097601,
"ip": "x.x.x.x",
"last_connect": 1539962304,
"last_scanned": 1539437865,
"linked_on": 1529677890,
"name": "x2xx2x2x2",
"platform": "WINDOWS",
"plugin_feed_id": "201810181351",
"status": "on",
"uuid": "7e3ef1ff-4f08-445a-b500-e7ce3ca9a2f2"
},
{
"core_build": "14",
"core_version": "7.1.0",
"distro": "win-x86-64",
"id": 2234103,
"ip": "x6x6x6x6x",
"last_connect": 1537384290,
"linked_on": 1537384247,
"name": "x7x7x7x",
"platform": "WINDOWS",
"status": "off",
"uuid": "0696ee38-402a-4866-b753-2816482dfce6"
}],
"pagination": {
"limit": 5000,
"offset": 0,
"sort": [
{
"name": "name",
"order": "asc"
}
],
"total": 14416
}
}

出于同样的目的,我编写了以下代码:

import json
from pandas.io.json import json_normalize
with open('out.json') as f:
data = json.load(f)
df = json_normalize(data, 'agents', [['groups', 'name']], errors='ignore')
print(df)

这将按原样解压缩"代理"中的所有字段(以及作为多值字段的"组"字段(,以及一个名为"groups.name"的新字段,该字段为 null(所有值均为 NaN(。

我只想将"代理"字段中的字段解压缩到数据帧中,将"组"字段中的字段解压缩为单独的列("core_build"、"core_version"、"发行版"、"groups.name"、"id"、"ip"、"last_connect"、"last_scanned"、"linked_on"、"名称"、"平台"、"plugin_feed_id"、"状态"、"uuid"(。

我怎样才能做到这一点?

编辑: 执行以下操作

df = json_normalize(pd.concat([pd.DataFrame(i) for i in data['agents']]).to_dict('r'))

返回错误 值错误:如果使用所有标量值,则必须传递索引

您可以将pd.concat()与列表推导一起使用:

df = pd.concat([pd.DataFrame(i) for i in my_json['agents']])

或者,如果您想解压缩类型dictgroup列以分隔列,请尝试此操作:

df = json_normalize(pd.concat([pd.DataFrame(i) for i in my_json['agents']]).to_dict('r'))

收益 率:

core_build core_version      distro  groups.id groups.name       id  
0         17        7.1.1  win-x86-64     101819   O Laptops  2198802   
1         17        7.1.1  win-x86-64     101893   V Laptops  2169839   
2         17        7.1.1  win-x86-64     101839    Personal  2097601   
3         17        7.1.1  win-x86-64     102037          Wi  2097601   
4         17        7.1.1  win-x86-64     102049         MS8  2097601   
ip  last_connect  last_scanned   linked_on      name platform  
0  x.x.x.x    1539962159    1539373347  1534964847  x1x1x1x1  WINDOWS   
1  x.x.x.x    1539962767    1539374603  1533666075  x2x2x2x2  WINDOWS   
2  x.x.x.x    1539962304    1539437865  1529677890  x3x3x3x3  WINDOWS   
3  x.x.x.x    1539962304    1539437865  1529677890  x3x3x3x3  WINDOWS   
4  x.x.x.x    1539962304    1539437865  1529677890  x3x3x3x3  WINDOWS   
plugin_feed_id status                                  uuid  
0   201810182051     on  ca8b941a-80cd-4c1c-8044-760e69781eb7  
1   201810171657     on  9400817b-235b-423b-b163-c4c86f973232  
2   201810181351     on  7e3ef1ff-4f08-445a-b500-e7ce3ca9a2f2  
3   201810181351     on  7e3ef1ff-4f08-445a-b500-e7ce3ca9a2f2  
4   201810181351     on  7e3ef1ff-4f08-445a-b500-e7ce3ca9a2f2  

最新更新