我有以下格式的JSON数据-
[
{
"score": 0.9228411211686975,
"keypoints": [
{
"score": 0.9997879266738892,
"part": "nose",
"position": {
"x": 503.1851299157304,
"y": 348.069222553839
}
},
{
"score": 0.9999929070472717,
"part": "leftEye",
"position": {
"x": 564.1951954588015,
"y": 304.60602835323033
}
}
]
},
{
"score": 0.922500729560852,
"keypoints": [
{
"score": 0.9998152256011963,
"part": "nose",
"position": {
"x": 503.0711610486892,
"y": 350.03152797284645
}
},
{
"score": 0.9999920129776001,
"part": "leftEye",
"position": {
"x": 564.4444932116105,
"y": 305.9835118211611
}
}
]
}
]
我想用它构建一个熊猫数据帧,格式如下-
score keypoints_score keypoints_position.x keypoints_position.y
0 0.922841 0.999788 503.185129 348.069222
1 0.922841 0.999993 564.195195 304.606028
2 0.922500 0.999815 503.071161 350.031527
3 0.922500 0.999992 564.444493 305.983511
我已经写了以下代码-
import numpy as np
import json
import os
from pandas.io.json import json_normalize
file = open('path_to_a_json_file.json')
data = json.load(file)
df = json_normalize(data, 'keypoints', ['score'], record_prefix='keypoints_')
df1 = df.reindex(columns=['score', 'keypoints_score', 'keypoints_position.x', 'keypoints_position.y'])
print df1
这个代码给我-
score keypoints_score keypoints_position.x keypoints_position.y
0 0.922841 0.999788 NaN NaN
1 0.922841 0.999993 NaN NaN
2 0.922500 0.999815 NaN NaN
3 0.922500 0.999992 NaN NaN
有人能帮我指出错误吗。我认为我正在做的数据帧reindex
有一些问题,但似乎无法理解问题出在哪里。谢谢
在我的pandas 1.0.1上,只有当score
是数组或null时,它才有效。您可以稍微调整一下JSON,使其适合:
# convert score from a scalar to an array of 1
for item in data:
item['score'] = [item['score']]
# now we can put it into the `meta` argument
json_normalize(data, record_path='keypoints', meta='score', record_prefix='keypoints_')
结果:
keypoints_score keypoints_part keypoints_position.x keypoints_position.y score
0 0.999788 nose 503.185130 348.069223 0.922841
1 0.999993 leftEye 564.195195 304.606028 0.922841
2 0.999815 nose 503.071161 350.031528 0.922501
3 0.999992 leftEye 564.444493 305.983512 0.922501
根据需要重命名、重新排序和删除列。