如何对下面的数据帧进行切片,以便选择与sign=-1和csum_count==5的行关联的数据?
import pandas as pd
ret = np.array([np.nan,0.022, -0.007, 0.005, 0.08 , 0.047, -0.012, -0.012,0.026, -0.022, -0.064, -0.006, -0.02 , -0.013, 0.012, -0.005,-0.01 , 0.007, -0.005, 0.002])
sign = np.array([-1, 1, -1, 1, 1, 1, -1, -1, 1, -1, -1, -1, -1, -1, 1, -1, -1,1, -1, 1])
cumsum = np.array([ 1, 2, 3, 4, 4, 4, 5, 5, 6, 7, 7, 7, 7, 7, 8, 9, 9,10, 11, 12])
index = pd.bdate_range("2007-01-03","2007-01-31",holidays=['2007-01-15'], freq='C', name='Index')
raw = pd.DataFrame({'return':ret,'sign': sign,'cumsum':cumsum}, index = index)
raw2=raw.groupby(['sign','cumsum']).agg(csum_count=('cumsum','count'),mean_return=('return','sum')).sort_values(['sign','csum_count'],ascending=False)
raw2
csum_count mean_return
sign cumsum
1 4 3 0.132
2 1 0.022
6 1 0.026
8 1 0.012
10 1 0.007
12 1 0.002
-1 7 5 -0.125
5 2 -0.024
9 2 -0.015
1 1 0.000
3 1 -0.007
11 1 -0.005
我尝试了raw2.loc[-1,'sum_count'==5],但这产生了一个错误。
我想您在这里需要get_level_values
:
raw2.loc[raw2['csum_count'].eq(5) &
(raw2.index.get_level_values('sign') == -1)]
输出:
csum_count mean_return
sign cumsum
-1 7 5 -0.125
或query
:
# would have work if `sign` is not a python keyword
# raw2.query('sign=-1 & csum_count=5')
如果您不介意重置索引,您可以执行
raw2.reset_index()[lambda r: (r['sign'] == -1) & (r['csum_count'] == 5)]
获取
sign cumsum csum_count mean_return
6 -1 7 5 -0.125