如何在应用groupby方法后切片到pandas数据帧中



如何对下面的数据帧进行切片,以便选择与sign=-1和csum_count==5的行关联的数据?

import pandas as pd
ret = np.array([np.nan,0.022, -0.007,  0.005,  0.08 ,  0.047, -0.012, -0.012,0.026, -0.022, -0.064, -0.006, -0.02 , -0.013,  0.012, -0.005,-0.01 ,  0.007, -0.005,  0.002])
sign = np.array([-1,  1, -1,  1,  1,  1, -1, -1,  1, -1, -1, -1, -1, -1,  1, -1, -1,1, -1,  1])
cumsum = np.array([ 1,  2,  3,  4,  4,  4,  5,  5,  6,  7,  7,  7,  7,  7,  8,  9,  9,10, 11, 12])
index = pd.bdate_range("2007-01-03","2007-01-31",holidays=['2007-01-15'], freq='C', name='Index')
raw = pd.DataFrame({'return':ret,'sign': sign,'cumsum':cumsum}, index = index)
raw2=raw.groupby(['sign','cumsum']).agg(csum_count=('cumsum','count'),mean_return=('return','sum')).sort_values(['sign','csum_count'],ascending=False)
raw2
csum_count  mean_return
sign cumsum
1   4                3        0.132
2                1        0.022
6                1        0.026
8                1        0.012
10               1        0.007
12               1        0.002
-1   7                5       -0.125
5                2       -0.024
9                2       -0.015
1                1        0.000
3                1       -0.007
11               1       -0.005

我尝试了raw2.loc[-1,'sum_count'==5],但这产生了一个错误。

我想您在这里需要get_level_values

raw2.loc[raw2['csum_count'].eq(5) &
(raw2.index.get_level_values('sign') == -1)]

输出:

csum_count  mean_return
sign cumsum                         
-1   7                5       -0.125

query:

# would have work if `sign` is not a python keyword
# raw2.query('sign=-1 & csum_count=5')

如果您不介意重置索引,您可以执行

raw2.reset_index()[lambda r: (r['sign'] == -1) & (r['csum_count'] == 5)]

获取


sign   cumsum  csum_count  mean_return
6   -1      7       5           -0.125

最新更新