我正在尝试使用panda计算前365天每个客户的运行总数,但我的代码不起作用。我的预期输出是这样的:
日期 | 客户 | 每日_total_per_customerrolling_total | |||
---|---|---|---|---|---|
2016-07-29 | 1 | 100 | 100|||
2016-08-01 | 1 | 50150 | |||
2017-01-12 | 1 | 80230 | [/tr>|||
2017-10-23 | 1 | 180 | 260 | ||
2018-03-03 | 1 | 0 | 180 | ||
2018-03-06 | 1 | 40 | 220 | ||
2019-03-16 | 1 | 500 | |||
2017-04-07 | 2 | 50 | 50|||
2017-04-09 | 2 | 230 | 280 | ||
2018-02-11 | 2 | 80 | 360 | ||
2018年5月12日 | 2 | 0 | 80 | ||
2019-05-10 | 2 | 0 | 0 |
注释代码
# Parse the strings to datetime
df['date'] = pd.to_datetime(df['date'])
# Sort the dates in ASC order if not already sorted
df = df.sort_values(['customer', 'date'])
# Group the dataframe by customer then for each group
# calculate rolling sum on 'daily_total_per_customer'
s = df.groupby('customer').rolling('365d', on='date')['daily_total_per_customer'].sum()
# Merge the result with original df
df.merge(s.reset_index(name='rolling_total'))
date customer daily_total_per_customer rolling_total
0 2016-07-29 1 100 100
1 2016-08-01 1 50 150
2 2017-01-12 1 80 230
3 2017-10-23 1 180 260
4 2018-03-03 1 0 180
5 2018-03-06 1 40 220
6 2019-03-16 1 500 500
7 2017-04-07 2 50 50
8 2017-04-09 2 230 280
9 2018-02-11 2 80 360
10 2018-05-12 2 0 80
11 2019-05-10 2 0 0