从嵌套的JSOn构建Pandas数据框架



我有以下格式的JSON数据-

[
{
"score": 0.9228411211686975,
"keypoints": [
{
"score": 0.9997879266738892,
"part": "nose",
"position": {
"x": 503.1851299157304,
"y": 348.069222553839
}
},
{
"score": 0.9999929070472717,
"part": "leftEye",
"position": {
"x": 564.1951954588015,
"y": 304.60602835323033
}
}
]
},
{
"score": 0.922500729560852,
"keypoints": [
{
"score": 0.9998152256011963,
"part": "nose",
"position": {
"x": 503.0711610486892,
"y": 350.03152797284645
}
},
{
"score": 0.9999920129776001,
"part": "leftEye",
"position": {
"x": 564.4444932116105,
"y": 305.9835118211611
}
}
]
}
]

我想用它构建一个熊猫数据帧,格式如下-

score  keypoints_score         keypoints_position.x         keypoints_position.y
0     0.922841         0.999788                   503.185129                   348.069222
1     0.922841         0.999993                   564.195195                   304.606028
2     0.922500         0.999815                   503.071161                   350.031527
3     0.922500         0.999992                   564.444493                   305.983511

我已经写了以下代码-

import numpy as np
import json
import os
from pandas.io.json import json_normalize
file = open('path_to_a_json_file.json')
data = json.load(file)
df = json_normalize(data, 'keypoints', ['score'], record_prefix='keypoints_')
df1 = df.reindex(columns=['score', 'keypoints_score', 'keypoints_position.x', 'keypoints_position.y'])
print df1

这个代码给我-

score  keypoints_score         keypoints_position.x         keypoints_position.y
0     0.922841         0.999788                   NaN                   NaN
1     0.922841         0.999993                   NaN                   NaN
2     0.922500         0.999815                   NaN                   NaN
3     0.922500         0.999992                   NaN                   NaN

有人能帮我指出错误吗。我认为我正在做的数据帧reindex有一些问题,但似乎无法理解问题出在哪里。谢谢

在我的pandas 1.0.1上,只有当score是数组或null时,它才有效。您可以稍微调整一下JSON,使其适合:

# convert score from a scalar to an array of 1
for item in data:
item['score'] = [item['score']]
# now we can put it into the `meta` argument
json_normalize(data, record_path='keypoints', meta='score', record_prefix='keypoints_')

结果:

keypoints_score keypoints_part  keypoints_position.x  keypoints_position.y     score
0         0.999788           nose            503.185130            348.069223  0.922841
1         0.999993        leftEye            564.195195            304.606028  0.922841
2         0.999815           nose            503.071161            350.031528  0.922501
3         0.999992        leftEye            564.444493            305.983512  0.922501

根据需要重命名、重新排序和删除列。

最新更新