Pandas扁平化嵌套JSON

我有以下JSON并使用json_normalize来平嵌套数组。

它可以工作，但是结果输出4条记录(门票)而不是最初的2条。

输入JSON文件(2票，2个嵌套的custom_fields):

[
{
"ticket_id": 15177,
"status": "solved",
"custom_fields": [
{
"id": 360015283851,
"value": "severity_2"
},
{
"id": 360016758872,
"value": "issue" 
}
]
},
{
"ticket_id": 15178,
"status": "solved",
"custom_fields": [
{
"id": 360015283851,
"value": "severity_3"
},
{
"id": 360016758872,
"value": "request"
}
]
}
]

使用下面的代码，我可以平铺嵌套数组，但输出给我4条记录，而不是最初的2条:

import json
import pandas as pd
# Read CSV from file
with open('./testcase.json') as f:
d = json.load(f)
df = pd.json_normalize(data=d, record_path='custom_fields', meta=['ticket_id','status'])
df.to_json('testcase-out.json',orient='records')

输出:

[
{
"id": 360015283851,
"status": "solved",
"ticket_id": 15177,
"value": "severity_2"
},
{
"id": 360016758872,
"status": "solved",
"ticket_id": 15177,
"value": "issue"
},
{
"id": 360015283851,
"status": "solved",
"ticket_id": 15178,
"value": "severity_3"
},
{
"id": 360016758872,
"status": "solved",
"ticket_id": 15178,
"value": "request"
}
]
id       value ticket_id  status
0  360015283851  severity_2     15177  solved
1  360016758872       issue     15177  solved
2  360015283851  severity_3     15178  solved
3  360016758872     request     15178  solved

我更喜欢这样的:保留这2条记录，并从嵌套的对象中创建列。

[
{
"custom_field_id_360015283851": "severity_2",
"custom_field_id_360016758872": "issue",
"status": "solved",
"ticket_id": 15177
},
{
"custom_field_id_360015283851": "severity_3",
"custom_field_id_360016758872": "request",
"status": "solved",
"ticket_id": 15178
}
]

是否有可能保留号码或记录或以某种方式合并?

编辑:下面的工作，但它是静态的，可能不优雅。但作品…

# create new columns based on the custom field values
df['cf_360015283851'] = np.where(df['id'] == 360015283851, df['value'], '')
df['cf_360016758872'] = np.where(df['id'] == 360016758872, df['value'], '')
# delete columns from the custom values
del df['id']
del df['value']
# merge the lines where ticket_id is the same
df = df.replace('',np.nan, regex=True)
df1 = df.groupby('ticket_id',as_index=False,sort=False).last()

输出:

ticket_id  status cf_360015283851 cf_360016758872
0      15177  solved      severity_2           issue
1      15178  solved      severity_3         request

您可以直接使用json模块处理json文件并直接使用Python处理。但是，正如您已经开始使用Pandas一样，您可以只pivot当前数据框架:

df.pivot(index=['ticket_id', 'status'], columns='id', values='value')

it give immediately:

id               360015283851 360016758872
ticket_id status                          
15177     solved   severity_2        issue
15178     solved   severity_3      request

相关内容

最新更新

热门标签：