Pandas:通过排序将csv数据输出到嵌套的json中



给定下面的示例CSV数据,在pandas DataFrame中,我如何按如下输出to_json

as_of
category
type
subtype
log: [
#sorted by timestamp
{timestamp: 1618879229, action: add, stale_timestamp: true},
{timestamp: 1608879229, action: remove, stale_timestamp: None},
]
20210415
apples
A
big
log: [
{timestamp: 1618879229, action: add, stale_timestamp: None},
]
small
log: [
{timestamp: 1618879229, action: add, stale_timestamp: None},
{timestamp: 1608879229, action: remove, stale_timestamp: None},
{timestamp: 1518879229, action: add, stale_timestamp: None},
]
B
big
log: [
{timestamp: 1618879229, action: add, stale_timestamp: None},
]

如果你还可以帮助我从嵌套的json返回DataFrame,那将是一个额外的奖励!

类型子类型跟踪时间戳添加A添加添加CC添加甜>添加甜>甜>AA添加BCC1611103342.6703315>甜>甜甜甜>甜>甜>A>添加真A>真添加真B>真C真小添加真甜>甜>
as_of类别操作时间戳
20210415苹果1618879229.6703315
20210415苹果添加1618879229.6703315
20210415苹果B1618879229.6703315
20210415苹果B1618879229.6703315
20210415苹果添加1618879229.6703315
20210415苹果添加1618879229.6703315
202103橙子1616892142.6703315
202103橙子去除1616632942.6703315
202103橙子添加1616200942.6703315
202103葡萄1616200942.6703315
20212橙子添加1616200942.6703315
20212葡萄添加1616200942.6703315
20210115苹果添加1611103342.6703315
20210115苹果添加1611103342.6703315
20210115苹果B1611103342.6703315
20210115苹果添加1611103342.6703315
20210115苹果添加1611103342.6703315
20210115苹果添加
202101橙子添加1608424942.6703315
202101葡萄添加1608424942.6703315
202012橙子添加1608424942.6703315
202012葡萄添加1608424942.6703315
202011橙子添加1608424942.6703315
202011葡萄添加1608424942.6703315
20201015苹果1608424942.6703315
20201015苹果添加1608424942.6703315
20201015苹果B1608424942.6703315
20201015苹果添加1608424942.6703315
20201015苹果添加1608424942.6703315
20201015苹果C1608424942.6703315
202010橙子添加1608424942.6703315True
202010葡萄添加1608424942.6703315True

首先我将表格转换为CSV:

as_of,category,type,sub_type,action,timestamp,stale_timestamp
20210415,apples,A,big,add,1618879230,
20210415,apples,A,small,add,1618879230,
20210415,apples,B,big,add,1618879230,
20210415,apples,B,small,add,1618879230,
20210415,apples,C,big,add,1618879230,
20210415,apples,C,small,add,1618879230,
202103,oranges,sweet,,add,1616892143,
202103,oranges,sweet,,remove,1616632943,
202103,oranges,sweet,,add,1616200943,
202103,grapes,sweet,,add,1616200943,
202102,oranges,sweet,,add,1616200943,
202102,grapes,sweet,,add,1616200943,
20210115,apples,A,big,add,1611103343,
20210115,apples,A,small,add,1611103343,
20210115,apples,B,big,add,1611103343,
20210115,apples,B,small,add,1611103343,
20210115,apples,C,big,add,1611103343,
20210115,apples,C,small,add,1611103343,
202101,oranges,sweet,,add,1608424943,
202101,grapes,sweet,,add,1608424943,
202012,oranges,sweet,,add,1608424943,
202012,grapes,sweet,,add,1608424943,
202011,oranges,sweet,,add,1608424943,
202011,grapes,sweet,,add,1608424943,
20201015,apples,A,big,add,1608424943,TRUE
20201015,apples,A,small,add,1608424943,TRUE
20201015,apples,B,big,add,1608424943,TRUE
20201015,apples,B,small,add,1608424943,TRUE
20201015,apples,C,big,add,1608424943,TRUE
20201015,apples,C,small,add,1608424943,TRUE
202010,oranges,sweet,,add,1608424943,TRUE
202010,grapes,sweet,,add,1608424943,TRUE

丢失的条目会导致JSON稍后出现问题。这要么需要在输入文件中修复,要么需要在Python转换器中修复。此外,有些日期似乎缺少字符。

因为Pandas中没有一个预定义的orient选项可以满足这一要求,所以我编写了一个自定义词典,然后将该词典转换为JSON。

import pandas
import json
df = pandas.read_csv('sheet1.csv',header=None, dtype=str)
mydic = {}
for unique_col0 in df[0].unique():
mydic[unique_col0] = {}

sub_df = df[df[0]==unique_col0]
for unique_col1 in sub_df[1].unique():
mydic[unique_col0][unique_col1] = {}

sub_sub_df = sub_df[sub_df[1]==unique_col1]
for unique_col2 in sub_sub_df[2].unique():
mydic[unique_col0][unique_col1][unique_col2] = {}

sub_sub_sub_df = sub_sub_df[sub_sub_df[2]==unique_col2]
for unique_col3 in sub_sub_sub_df[3].unique():
mydic[unique_col0][unique_col1][unique_col2][unique_col3] = {'log':[]}

for index in range(sub_sub_sub_df.shape[0]):
this_dict = {'timestamp': list(sub_sub_sub_df[5])[index],
'action': list(sub_sub_sub_df[4])[index],
'stale_timestamps': list(sub_sub_sub_df[6])[index]}
mydic[unique_col0][unique_col1][unique_col2][unique_col3]['log'].append(this_dict)

with open('output.json','w') as file_handle:
json.dump(mydic,file_handle,indent=2)

asker提供的示例输出与Python实现实际产生的结果不一致。

最新更新