如何使用多索引逐行计算百分比



我有大约 3000 万行的数据帧:

ID         DATE                         STATUS      
123        2017-01-04 18:08:56+00:00    True        
2017-01-04 18:09:56+00:00    True        
2017-01-06 19:12:30+00:00    False       
2017-01-07 19:12:30+00:00    False       
2017-01-08 19:12:30+00:00    False       
401        2017-01-01 18:08:56+00:00    False       
2017-01-02 18:19:56+00:00    True        
2017-01-05 09:15:30+00:00    True        
2017-01-06 12:12:30+00:00    False       

我想逐行计算每个 ID 的 True 值的百分比。该值应根据当前行的状态增加或减少。

我不擅长描述,但结果应该类似于这样:

ID         DATE                         STATUS      VALUE(%)
123        2017-01-04 18:08:56+00:00    True        100
2017-01-04 18:09:56+00:00    True        100
2017-01-06 19:12:30+00:00    False       66.66
2017-01-06 19:12:30+00:00    False       50
2017-01-06 19:12:30+00:00    False       40
401        2017-01-04 18:08:56+00:00    False       0
2017-01-04 18:09:56+00:00    True        50
2017-01-06 19:12:30+00:00    True        66.66
2017-01-06 19:12:30+00:00    False       50       

GroupBy.cumsum除以GroupBy.cumcount,乘以100,必要时四舍五入:

a = df.groupby(level=0)['STATUS'].cumsum()
b = df.groupby(level=0).cumcount() + 1
df['Val'] = a.div(b).mul(100).round(2)
print (df)
STATUS     Val
ID  DATE                                     
123 2017-01-04 18:08:56+00:00    True  100.00
2017-01-04 18:09:56+00:00    True  100.00
2017-01-06 19:12:30+00:00   False   66.67
2017-01-07 19:12:30+00:00   False   50.00
2017-01-08 19:12:30+00:00   False   40.00
401 2017-01-01 18:08:56+00:00   False    0.00
2017-01-02 18:19:56+00:00    True   50.00
2017-01-05 09:15:30+00:00    True   66.67
2017-01-06 12:12:30+00:00   False   50.00

你也可以这样做

df['p'] = df.groupby('ID')['STATUS'].transform(lambda x: x.cumsum()/(x.index-x.index.min()+1)).mul(100)
df1.groupby('ID').apply(lambda dd:dd.reset_index(drop=True)
.assign(col2=lambda dd:dd.STATUS.cumsum()/(dd.index+1)*100))
.droplevel([0]).pipe(print)
ID                       DATE  STATUS        col2
0  123  2017-01-04 18:08:56+00:00    True  100.000000
1  123  2017-01-04 18:09:56+00:00    True  100.000000
2  123  2017-01-06 19:12:30+00:00   False   66.666667
3  123  2017-01-07 19:12:30+00:00   False   50.000000
4  123  2017-01-08 19:12:30+00:00   False   40.000000
0  401  2017-01-01 18:08:56+00:00   False    0.000000
1  401  2017-01-02 18:19:56+00:00    True   50.000000
2  401  2017-01-05 09:15:30+00:00    True   66.666667
3  401  2017-01-06 12:12:30+00:00   False   50.000000

最新更新