我从pandas dataframe
查找日期范围(-7)内的用户数量的例子。
<表类>UserID 日期(Y/M/D) tbody><<tr>100 2021/02/15 100 2021/02/10 100 2021/02/8 101 2021/02/10 102 2021/02/15 103 2021/02/10 表类>
使用定制lambda函数:
#convert to datetimes
df['Date (Y/M/D)'] = pd.to_datetime(df['Date (Y/M/D)'])
#7 days timedelta
t = pd.Timedelta(7, unit='d')
#for each group counts values between previous 7 days and original
f = lambda x: x.apply(lambda y: (x.between(y - t, y).sum()))
df['new'] = df.groupby('UserID')['Date (Y/M/D)'].apply(f)
print (df)
UserID Date (Y/M/D) new
0 100 2021-02-15 3
1 100 2021-02-10 2
2 100 2021-02-08 1
3 101 2021-02-10 1
4 102 2021-02-15 1
5 103 2021-02-10 1
首先将日期列从字符串转换为日期时间(如果您以前没做过):
df['Date (Y/M/D)'] = pd.to_datetime(df['Date (Y/M/D)'])
然后只取最近7天的行:
df[df['Date (Y/M/D)'] >= pd.Timestamp.today().normalize() - pd.offsets.Day(7)]
要生成Count列,运行:
df['Count'] = df.groupby('UserID', group_keys=False).apply(
lambda x: pd.Series(len(x) - np.arange(len(x)), x.index))
结果是:
UserID Date (Y/M/D) Count
0 100 2021-02-15 3
1 100 2021-02-10 2
2 100 2021-02-08 1
3 101 2021-02-10 1
4 102 2021-02-15 1
5 103 2021-02-10 1