我正在尝试将简单的函数应用于熊猫中的组。我有这个组dataframe(按user_id分组)的结果,如下所示:
user_id cancelled_at
10 NaN
10 2021-02-26
10 NaN
10 NaN
10 2021-06-01
10 NaN
我想根据'cancelled_at'列的条件为每行添加标签这样的:
user_id cancelled_at result
10 NaN cancel
10 2021-02-26 cancel
10 NaN renew
10 NaN cancel
10 2021-06-01 cancel
10 NaN renew
非空'cancelled_at'行值,其前一行结果为cancel,否则结果为renew
如果需要设置1
,如果cancelled_at
的前一组值没有缺失,则使用DataFrameGroupBy.shift
与辅助列进行Series.notna
的比较:
df['Result'] = (df.assign(new = df.cancelled_at.notna())
.groupby('user_id')['new']
.shift(fill_value=False)
.astype(int))
print (df)
user_id start_date end_date cancelled_at Result
0 10 2020-12-27 2021-01-26 NaN 0
1 10 2021-01-27 2021-02-26 2021-02-26 0
2 10 2021-02-28 2021-03-30 NaN 1
3 10 2021-03-31 2021-04-30 NaN 0
4 10 2021-05-02 2021-06-01 2021-06-01 0
5 10 2021-06-02 2021-07-02 NaN 1
性能应该较慢,但可以使用自定义函数:
def f(x):
return x.notna().shift(fill_value=False).astype(int)
df['Result'] = df.groupby('user_id')['cancelled_at'].transform(f)
print (df)
user_id cancelled_at Result
0 10 NaN 0
1 10 2021-02-26 0
2 10 NaN 1
3 10 NaN 0
4 10 2021-06-01 0
5 10 NaN 1