为:
import pandas as pd
df = pd.DataFrame({
'trigger':[0,0,0, 1,1,1, 2,2,2, 3,3,3,],
'score' :[1,0,0, 0,1,0 ,0,0,1 ,1,1,1],
'label' :[1,0,0, 0,1,0 ,0,0,1 ,1,1,1]
})
# in reality ranked using some other column
df['rank'] = df.groupby(['trigger']).cumcount()
display(df)
我几乎想计算:
d_eval = df[df['rank'] <=2]
d_eval.groupby(['trigger']).agg({'score':'max', 'label':'max'})
但是,我不想丢失其他值(不包括在rank-filter中),而只考虑聚合中过滤的值。
还有别的方法吗?
- 加入结果返回
在熊猫中更直接?
一个选项是merge
:
d_eval = (df[df['rank'] <=2].groupby(['trigger'])
.agg({'score':'max', 'label':'max'})
)
df.merge(d_eval, on='trigger', suffixes=['','_max'])
输出:
trigger score label rank score_max label_max
0 0 1 1 0 1 1
1 0 0 0 1 1 1
2 0 0 0 2 1 1
3 1 0 0 0 1 1
4 1 1 1 1 1 1
5 1 0 0 2 1 1
6 2 0 0 0 1 1
7 2 0 0 1 1 1
8 2 1 1 2 1 1
9 3 1 1 0 1 1
10 3 1 1 1 1 1
11 3 1 1 2 1 1
或者是一行字
df.merge(df.assign(rank=df.groupby('trigger').cumcount())
.query('rank <=2')
.groupby('trigger')[['score','label']].max(),
on='trigger', suffixes=['','_max']
)