我尝试从datframe(分组列)生成json文件或字典
my datFrame is
df1 = pd.DataFrame({
'USER': ['ALL','ALL','BOB','STEVE','PAUL','KEITH','STEVE','STEVE','BOB'],
'CITY': ['ALL','ALL','PARIS','LONDON','MILAN','MADRID','LONDON','LONDON','PARIS'],
'TEAMS':['USA','EUROPE','Middle EST','CHINA','JAPAN','MORROCO','Fr','ENGLAN','AUSTRIA'],
'TASK':['ALL','MANY','ONE','TWO','THREE','FOUR','FIVE','SIX','SEVEN']})
预期的输出应该是这样的:
exepectdict ={
[{
'USER':'ALL',
'CITY':'ALL',
'work':
{ 'USA':'ALL',
'EUROPE':'MANY'
}
},
{
'USER':'BOB',
'CITY':'PARIS',
'work':
{ 'Middle EST':'ONE',
'AUSTRIA':'SEVEN'
}
},
{
'USER':'KEITH',
'CITY':'MADRID',
'work':
{ 'MORROCO':'FOUR'
}
},
{
'USER':'PAUL',
'CITY':'MILAN',
'work':
{ 'JAPAN':'THREE'
}
},
{
'USER':'STEVE',
'CITY':'LONDON',
'work':
{ 'CHINA':'TWO',
'Fr':'FIVE',
'ENGLAN':'SIX'
}
}
]}
为了做到这一点,我尝试对行(USER?CITY)进行分组,并为(TEAMS和TASK列)生成一个列表:
df_results=df1.groupby(['USER','CITY'])['TEAMS','TASK'].agg(list)
|USER | CITY | TEAMS | TASK |
|:----|:-------:|:---------------------:|-----------------:|
|ALL | ALL | [USA, EUROPE]| [ALL, MANY]|
|BOB | PARIS | [Middle EST, AUSTRIA]| [ONE, SEVEN]|
|KEITH| MADRID | [MORROCO]| [FOUR]|
|PAUL | MILAN | [JAPAN]| [THREE]|
|STEVE| LONDON | [CHINA, Fr, ENGLAN]| [TWO, FIVE, SIX]|
但我不知道如何生成预期的字典格式
创建"work"因为它是groupby
期间TEAMS:TASK的一对一映射df_results = pd.DataFrame(df.groupby(['USER','CITY'])[['TEAMS','TASK']].apply(lambda x:dict(zip(x['TEAMS'],x['TASK']))), columns=['work'])
df_results.reset_index().to_dict('records')