如何在pandas中添加基于最后一个用户事件的行?



假设我有一个包含用户事件的数据框架

+---------+------------------+---------------------+
| user_id | event_name       | timestamp           |
+---------+------------------+---------------------+
| 1       | HomeAppear       | 2020-12-13 06:38:14 |
+---------+------------------+---------------------+
| 1       | TariffsAppear    | 2020-12-13 06:40:13 |
+---------+------------------+---------------------+
| 1       | CheckoutPayClick | 2020-12-13 06:50:12 |
+---------+------------------+---------------------+
| 2       | HomeAppear       | 2020-12-13 11:38:33 |
+---------+------------------+---------------------+
| 2       | TariffsAppear    | 2020-12-13 11:39:18 |
+---------+------------------+---------------------+

对于每个用户在他的最后一个(按时间戳)事件之后,我想添加新的行'End'事件与在前一个事件相同的时间戳:

+---------+------------------+---------------------+
| 1       | End              | 2020-12-13 06:50:12 |
+---------+------------------+---------------------+

我不知道该怎么做。在SQL中,我将使用LAG()或LEAD()来执行此操作。但是熊猫呢?

User_id的最后一行改为DataFrame.drop_duplicates,将event_name改为End,并将concat与排序索引相加(添加最安全排序mergesort):

#if necessary sorting
df = df.sort_values(['user_id', 'timestamp'], ignore_index=True)
df2 = df.drop_duplicates('user_id', keep='last').assign(event_name = 'End')
df = pd.concat([df, df2]).sort_index(kind='mergesort').reset_index(drop=True)
print (df)
user_id        event_name            timestamp
0        1        HomeAppear  2020-12-13 06:38:14
1        1     TariffsAppear  2020-12-13 06:40:13
2        1  CheckoutPayClick  2020-12-13 06:50:12
3        1               End  2020-12-13 06:50:12
4        2        HomeAppear  2020-12-13 11:38:33
5        2     TariffsAppear  2020-12-13 11:39:18
6        2               End  2020-12-13 11:39:18

你可以这样做:

df = df.sort_values(['user_id', 'timestamp'])
df1=pd.DataFrame({'user_id':np.unique(df['user_id']),'event_name':'End','timestamp':np.NaN})
df=pd.concat([df,df1],axis=0).sort_values(by='user_id')
df['timestamp']=df['timestamp'].fillna(method='ffill')

相关内容

  • 没有找到相关文章

最新更新