我有以下JSON并使用json_normalize来平嵌套数组。
它可以工作,但是结果输出4条记录(门票)而不是最初的2条。
输入JSON文件(2票,2个嵌套的custom_fields):
[
{
"ticket_id": 15177,
"status": "solved",
"custom_fields": [
{
"id": 360015283851,
"value": "severity_2"
},
{
"id": 360016758872,
"value": "issue"
}
]
},
{
"ticket_id": 15178,
"status": "solved",
"custom_fields": [
{
"id": 360015283851,
"value": "severity_3"
},
{
"id": 360016758872,
"value": "request"
}
]
}
]
使用下面的代码,我可以平铺嵌套数组,但输出给我4条记录,而不是最初的2条:
import json
import pandas as pd
# Read CSV from file
with open('./testcase.json') as f:
d = json.load(f)
df = pd.json_normalize(data=d, record_path='custom_fields', meta=['ticket_id','status'])
df.to_json('testcase-out.json',orient='records')
输出:
[
{
"id": 360015283851,
"status": "solved",
"ticket_id": 15177,
"value": "severity_2"
},
{
"id": 360016758872,
"status": "solved",
"ticket_id": 15177,
"value": "issue"
},
{
"id": 360015283851,
"status": "solved",
"ticket_id": 15178,
"value": "severity_3"
},
{
"id": 360016758872,
"status": "solved",
"ticket_id": 15178,
"value": "request"
}
]
id value ticket_id status
0 360015283851 severity_2 15177 solved
1 360016758872 issue 15177 solved
2 360015283851 severity_3 15178 solved
3 360016758872 request 15178 solved
我更喜欢这样的:保留这2条记录,并从嵌套的对象中创建列。
[
{
"custom_field_id_360015283851": "severity_2",
"custom_field_id_360016758872": "issue",
"status": "solved",
"ticket_id": 15177
},
{
"custom_field_id_360015283851": "severity_3",
"custom_field_id_360016758872": "request",
"status": "solved",
"ticket_id": 15178
}
]
是否有可能保留号码或记录或以某种方式合并?
编辑:下面的工作,但它是静态的,可能不优雅。但作品…
# create new columns based on the custom field values
df['cf_360015283851'] = np.where(df['id'] == 360015283851, df['value'], '')
df['cf_360016758872'] = np.where(df['id'] == 360016758872, df['value'], '')
# delete columns from the custom values
del df['id']
del df['value']
# merge the lines where ticket_id is the same
df = df.replace('',np.nan, regex=True)
df1 = df.groupby('ticket_id',as_index=False,sort=False).last()
输出:
ticket_id status cf_360015283851 cf_360016758872
0 15177 solved severity_2 issue
1 15178 solved severity_3 request
您可以直接使用json
模块处理json文件并直接使用Python处理。但是,正如您已经开始使用Pandas一样,您可以只pivot当前数据框架:
df.pivot(index=['ticket_id', 'status'], columns='id', values='value')
it give immediately:
id 360015283851 360016758872
ticket_id status
15177 solved severity_2 issue
15178 solved severity_3 request