给定下面的示例CSV数据,在pandas DataFrame中,我如何按如下输出to_json
as_of
category
type
subtype
log: [
#sorted by timestamp
{timestamp: 1618879229, action: add, stale_timestamp: true},
{timestamp: 1608879229, action: remove, stale_timestamp: None},
]
20210415
apples
A
big
log: [
{timestamp: 1618879229, action: add, stale_timestamp: None},
]
small
log: [
{timestamp: 1618879229, action: add, stale_timestamp: None},
{timestamp: 1608879229, action: remove, stale_timestamp: None},
{timestamp: 1518879229, action: add, stale_timestamp: None},
]
B
big
log: [
{timestamp: 1618879229, action: add, stale_timestamp: None},
]
如果你还可以帮助我从嵌套的json返回DataFrame,那将是一个额外的奖励!
as_of | 类别 | 类型子类型操作 | 时间戳 | 跟踪时间戳||
---|---|---|---|---|---|
20210415 | 苹果 | 大 | 添加1618879229.6703315 | ||
20210415 | 苹果 | A小 | 添加 | 1618879229.6703315 | |
20210415 | 苹果 | B | 大 | 添加1618879229.6703315 | |
20210415 | 苹果 | B | 小 | 添加1618879229.6703315 | |
20210415 | 苹果 | C大 | 添加 | 1618879229.6703315 | |
20210415 | 苹果 | C小 | 添加 | 1618879229.6703315 | |
202103 | 橙子 | 甜 | 添加1616892142.6703315 | ||
202103 | 橙子 | 甜去除 | 1616632942.6703315 | >||
202103 | 橙子 | 甜 | 添加 | 1616200942.6703315 | |
202103 | 葡萄 | 甜 | 添加1616200942.6703315 | ||
20212 | 橙子 | 甜>添加 | 1616200942.6703315 | ||
20212 | 葡萄 | 甜添加 | 1616200942.6703315 | >||
20210115 | 苹果 | A大 | 添加 | 1611103342.6703315 | |
20210115 | 苹果 | A小 | 添加 | 1611103342.6703315 | |
20210115 | 苹果 | B | 大 | 添加1611103342.6703315 | |
20210115 | 苹果 | B小 | 添加 | 1611103342.6703315 | |
20210115 | 苹果 | C大 | 添加 | 1611103342.6703315 | |
20210115 | 苹果 | C小 | 添加 | 1611103342.6703315>||
202101 | 橙子 | 甜>添加 | 1608424942.6703315 | ||
202101 | 葡萄 | 甜添加 | 1608424942.6703315 | ||
202012 | 橙子 | 甜添加 | 1608424942.6703315 | ||
202012 | 葡萄 | 甜添加 | 1608424942.6703315 | >||
202011 | 橙子 | 甜>添加 | 1608424942.6703315 | ||
202011 | 葡萄 | 甜>添加 | 1608424942.6703315 | ||
20201015 | 苹果 | A大 | >添加1608424942.6703315 | 真||
20201015 | 苹果 | A小 | >添加 | 1608424942.6703315 | 真|
20201015 | 苹果 | B | 大 | 添加1608424942.6703315 | 真|
20201015 | 苹果 | B小 | >添加 | 1608424942.6703315 | 真|
20201015 | 苹果 | C大 | 添加 | 1608424942.6703315 | 真|
20201015 | 苹果 | C | 小添加1608424942.6703315 | 真||
202010 | 橙子 | 甜添加 | 1608424942.6703315 | >True | |
202010 | 葡萄 | 甜添加 | 1608424942.6703315 | >True |
首先我将表格转换为CSV:
as_of,category,type,sub_type,action,timestamp,stale_timestamp
20210415,apples,A,big,add,1618879230,
20210415,apples,A,small,add,1618879230,
20210415,apples,B,big,add,1618879230,
20210415,apples,B,small,add,1618879230,
20210415,apples,C,big,add,1618879230,
20210415,apples,C,small,add,1618879230,
202103,oranges,sweet,,add,1616892143,
202103,oranges,sweet,,remove,1616632943,
202103,oranges,sweet,,add,1616200943,
202103,grapes,sweet,,add,1616200943,
202102,oranges,sweet,,add,1616200943,
202102,grapes,sweet,,add,1616200943,
20210115,apples,A,big,add,1611103343,
20210115,apples,A,small,add,1611103343,
20210115,apples,B,big,add,1611103343,
20210115,apples,B,small,add,1611103343,
20210115,apples,C,big,add,1611103343,
20210115,apples,C,small,add,1611103343,
202101,oranges,sweet,,add,1608424943,
202101,grapes,sweet,,add,1608424943,
202012,oranges,sweet,,add,1608424943,
202012,grapes,sweet,,add,1608424943,
202011,oranges,sweet,,add,1608424943,
202011,grapes,sweet,,add,1608424943,
20201015,apples,A,big,add,1608424943,TRUE
20201015,apples,A,small,add,1608424943,TRUE
20201015,apples,B,big,add,1608424943,TRUE
20201015,apples,B,small,add,1608424943,TRUE
20201015,apples,C,big,add,1608424943,TRUE
20201015,apples,C,small,add,1608424943,TRUE
202010,oranges,sweet,,add,1608424943,TRUE
202010,grapes,sweet,,add,1608424943,TRUE
丢失的条目会导致JSON稍后出现问题。这要么需要在输入文件中修复,要么需要在Python转换器中修复。此外,有些日期似乎缺少字符。
因为Pandas中没有一个预定义的orient
选项可以满足这一要求,所以我编写了一个自定义词典,然后将该词典转换为JSON。
import pandas
import json
df = pandas.read_csv('sheet1.csv',header=None, dtype=str)
mydic = {}
for unique_col0 in df[0].unique():
mydic[unique_col0] = {}
sub_df = df[df[0]==unique_col0]
for unique_col1 in sub_df[1].unique():
mydic[unique_col0][unique_col1] = {}
sub_sub_df = sub_df[sub_df[1]==unique_col1]
for unique_col2 in sub_sub_df[2].unique():
mydic[unique_col0][unique_col1][unique_col2] = {}
sub_sub_sub_df = sub_sub_df[sub_sub_df[2]==unique_col2]
for unique_col3 in sub_sub_sub_df[3].unique():
mydic[unique_col0][unique_col1][unique_col2][unique_col3] = {'log':[]}
for index in range(sub_sub_sub_df.shape[0]):
this_dict = {'timestamp': list(sub_sub_sub_df[5])[index],
'action': list(sub_sub_sub_df[4])[index],
'stale_timestamps': list(sub_sub_sub_df[6])[index]}
mydic[unique_col0][unique_col1][unique_col2][unique_col3]['log'].append(this_dict)
with open('output.json','w') as file_handle:
json.dump(mydic,file_handle,indent=2)
asker提供的示例输出与Python实现实际产生的结果不一致。