聚合在三列上,在不同条件下聚合总计,然后连接



我的测试数据帧:

print("Create List",  end='n') 
Test_Data = [('display_name', ['A', 'B', 'B','C','C','C','C','C',]),
('security_type1', ['GOVT', 'CORP','CORP','CORP','CORP','CORP','CORP','CORP']),
('currency_str', ['USD', 'NZD','USD','EUR','EUR','GBP','GBP','USD']),
('state', ['Done','Passed','Done','Done','Traded Away','Done','Done','Done']),
('rfq_qty_CAD_Equiv', [100000, 100000, 100000,100000,100000,100000,100000,100000]),
]
dfTest_Data = pd.DataFrame.from_items(Test_Data)
display(dfTest_Data)
display_name    security_type1  currency_str    state   rfq_qty_CAD_Equiv
A                     GOVT          USD          Done         100000
B                     CORP          NZD          Passed       100000
B                     CORP          USD          Done         100000
C                     CORP          EUR          Done         100000
C                     CORP          EUR          Traded Away  100000
C                     CORP          GBP          Done         100000
C                     CORP          GBP          Done         100000
C                     CORP          USD          Done         100000

我想要的输出如下。这里的驱动程序按display_namesecurity_type1currency_str分组。Total_RFQTotal_RFQ_Volume是关于display_name

display_name    security_type1  currency_str    Done_RFQ    Not_Done_RFQ    Total_RFQ
A               GOVT           USD            1             0          1  
B               CORP           USD            1             1          2
C               CORP           EUR            1             1          5
C               CORP           GBP            2             0          5
C               CORP           USD            1             0          5
Hit_Rate    Done_RFQ_Volume     Not_Done_RFQ_Volume     Total_RFQ_Volume
1.00             100000                 0                    100000              
0.50             100000               100000                 200000 
0.20             100000               100000                 500000 
0.40             200000                 0                    500000 
0.20             100000                 0                    500000 

Volume_per_Done_RFQ           Volume_per_Not_Done_RFQ   Volume_per_Total_RFQ
100000                              0                  100000
100000                           100000                100000
100000                           100000                100000
100000                              0                  100000
100000                              0                  100000

Hit_Rate=Done_RFQ/Total_RFQ

Volume_per_Done_RFQ=Done_RFQ_Volume/Done_RFQ

Volume_per_Not_Done_RFQ=Not_Done_RFQ_Volume/Not_Done_RFQ

Volume_per_Total_RFQ=Total_RFQ_Volume/Total_RFQ

大部分工作已经完成,我只是在合并第三个数据帧和显示需要零条目的Not_Done行项目时遇到问题

print("All Trades",  end='n') 
d = [
('Total_RFQ_Volume', 'sum'), 
('Total_RFQ', 'size'), 
]
df1 = dfTest_Data.groupby(['display_name'])['rfq_qty_CAD_Equiv'].agg(d)
display (df1)
print("Done Trades",  end='n') 
d = [
('Done_RFQ_Volume', 'sum'), 
('Done_RFQ', 'size'), 
]
mask = dfTest_Data['state'].str.contains('Done')
df2 = dfTest_Data[mask].groupby(['display_name','security_type1','currency_str'])['rfq_qty_CAD_Equiv'].agg(d).reset_index()
display (df2)
print("Not Done Trades",  end='n') 
d = [ 
('Not_Done_RFQ_Volume', 'sum'), 
('Not_Done_RFQ', 'size'), 
] 
mask = ~dfTest_Data['state'].str.contains('Done') 
df3 = dfTest_Data[mask].groupby(['display_name','security_type1','currency_str'])['rfq_qty_CAD_Equiv'] .agg(d) .reset_index()
display (df3)
print("Join Done trades on All Trades",  end='n') 
df_Done_Client_Hit_Rate_Volume = df2.join(df1, on='display_name').join(df3, on='display_name')
# Create additional calculated columns
df_Done_Client_Hit_Rate_Volume['Hit_Rate'] = df_Done_Client_Hit_Rate_Volume['Done_RFQ'] / df_Done_Client_Hit_Rate_Volume['Total_RFQ'] 
df_Done_Client_Hit_Rate_Volume['Volume_per_Done_RFQ'] = df_Done_Client_Hit_Rate_Volume['Done_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Done_RFQ'] 
df_Done_Client_Hit_Rate_Volume['Volume_per_Not_Done_RFQ'] = df_Done_Client_Hit_Rate_Volume['Not_Done_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Not_Done_RFQ'] 
df_Done_Client_Hit_Rate_Volume['Volume_per_Total_RFQ'] = df_Done_Client_Hit_Rate_Volume['Total_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Total_RFQ'] 
# Reorder columns
df_Done_Client_Hit_Rate_Volume = df_Done_Client_Hit_Rate_Volume[['display_name', 
'security_type1',
'currency_str', 
'Done_RFQ',
'Not_Done_RFQ',
'Total_RFQ',
'Hit_Rate',
'Done_RFQ_Volume',
'Not_Done_RFQ_Volume',
'Volume_per_Done_RFQ',
'Volume_per_Not_Done_RFQ',
'Total_RFQ_Volume'
'Volume_per_Total_RFQ'
]]
display (df_Done_Client_Hit_Rate_Volume)

我认为首先需要删除.reset_index以进行df2df3

d = [
('Done_RFQ_Volume', 'sum'), 
('Done_RFQ', 'size'), 
]
mask = dfTest_Data['state'].str.contains('Done')
df2 = dfTest_Data[mask].groupby(['display_name','security_type1','currency_str'])['rfq_qty_CAD_Equiv'].agg(d)
#print (df2)
print("Not Done Trades",  end='n') 
d = [ 
('Not_Done_RFQ_Volume', 'sum'), 
('Not_Done_RFQ', 'size'), 
] 
mask = ~dfTest_Data['state'].str.contains('Done') 
df3 = dfTest_Data[mask].groupby(['display_name','security_type1','currency_str'])['rfq_qty_CAD_Equiv'].agg(d) 

然后通过concatjoinDataFrame连接在一起:

df = pd.concat([df2, df3],axis=1).reset_index()
df_Done_Client_Hit_Rate_Volume = df.join(df1, on='display_name')

df_Done_Client_Hit_Rate_Volume['Hit_Rate'] = df_Done_Client_Hit_Rate_Volume['Done_RFQ'] / df_Done_Client_Hit_Rate_Volume['Total_RFQ'] 
df_Done_Client_Hit_Rate_Volume['Volume_per_Done_RFQ'] = df_Done_Client_Hit_Rate_Volume['Done_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Done_RFQ'] 
df_Done_Client_Hit_Rate_Volume['Volume_per_Not_Done_RFQ'] = df_Done_Client_Hit_Rate_Volume['Not_Done_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Not_Done_RFQ'] 
df_Done_Client_Hit_Rate_Volume['Volume_per_Total_RFQ'] = df_Done_Client_Hit_Rate_Volume['Total_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Total_RFQ'] 

最后将缺失值替换为0

df_Done_Client_Hit_Rate_Volume = df_Done_Client_Hit_Rate_Volume.fillna(0)
print (df_Done_Client_Hit_Rate_Volume)
display_name security_type1 currency_str  Done_RFQ_Volume  Done_RFQ  
0            A           GOVT          USD         100000.0       1.0   
1            B           CORP          NZD              0.0       0.0   
2            B           CORP          USD         100000.0       1.0   
3            C           CORP          EUR         100000.0       1.0   
4            C           CORP          GBP         200000.0       2.0   
5            C           CORP          USD         100000.0       1.0   
Not_Done_RFQ_Volume  Not_Done_RFQ  Total_RFQ_Volume  Total_RFQ  Hit_Rate  
0                  0.0           0.0            100000          1       1.0   
1             100000.0           1.0            200000          2       0.0   
2                  0.0           0.0            200000          2       0.5   
3             100000.0           1.0            500000          5       0.2   
4                  0.0           0.0            500000          5       0.4   
5                  0.0           0.0            500000          5       0.2   
Volume_per_Done_RFQ  Volume_per_Not_Done_RFQ  Volume_per_Total_RFQ  
0             100000.0                      0.0              100000.0  
1                  0.0                 100000.0              100000.0  
2             100000.0                      0.0              100000.0  
3             100000.0                 100000.0              100000.0  
4             100000.0                      0.0              100000.0  
5             100000.0                      0.0              100000.0  

相关内容

最新更新