添加每个行标签(基于条件)panadas groupby结果



我正在尝试将简单的函数应用于熊猫中的组。我有这个组dataframe(按user_id分组)的结果,如下所示:

user_id           cancelled_at  
10                NaN   
10                2021-02-26  
10                NaN   
10                NaN   
10                2021-06-01   
10                NaN 

我想根据'cancelled_at'列的条件为每行添加标签这样的:

user_id    cancelled_at   result 
10    NaN                 cancel
10    2021-02-26          cancel
10    NaN                 renew
10    NaN                 cancel
10    2021-06-01          cancel
10    NaN                 renew

非空'cancelled_at'行值,其前一行结果为cancel,否则结果为renew

如果需要设置1,如果cancelled_at的前一组值没有缺失,则使用DataFrameGroupBy.shift与辅助列进行Series.notna的比较:

df['Result'] = (df.assign(new = df.cancelled_at.notna())
.groupby('user_id')['new']
.shift(fill_value=False)
.astype(int))
print (df)
user_id  start_date    end_date cancelled_at  Result
0       10  2020-12-27  2021-01-26          NaN       0
1       10  2021-01-27  2021-02-26   2021-02-26       0
2       10  2021-02-28  2021-03-30          NaN       1
3       10  2021-03-31  2021-04-30          NaN       0
4       10  2021-05-02  2021-06-01   2021-06-01       0
5       10  2021-06-02  2021-07-02          NaN       1

性能应该较慢,但可以使用自定义函数:

def f(x):
return x.notna().shift(fill_value=False).astype(int)
df['Result'] = df.groupby('user_id')['cancelled_at'].transform(f)
print (df)
user_id cancelled_at  Result
0       10          NaN       0
1       10   2021-02-26       0
2       10          NaN       1
3       10          NaN       0
4       10   2021-06-01       0
5       10          NaN       1

相关内容

  • 没有找到相关文章

最新更新