我的数据集是这样的。它包含邮政编码内的购买记录,关于某个数据。我想创建一个新的数据框架,并将每个邮编中每个月的购买次数相加。问题是有时几个月没有活动。
ID Zipcode Date Purchase
1 9999 2018-12-24 1
2 9999 2018-12-26 1
3 9999 2019-3-14 1
4 9999 2019-4-8 1
5 2400 2018-12-12 1
6 2400 2018-12-14 1
7 2400 2019-1-15 1
8 2400 2019-2-30 1
理想情况下,数据框应该是这样的:
Zipcode Period Cumulative purchases
9999 December 2018 2
9999 January 2019 2
9999 February 2019 2
9999 March 2019 2
9999 April 2019 3
9999 May 2019 3
2400 December 2018 2
2400 January 2019 3
2400 February 2019 4
2400 March 2019 4
2400 April 2019 4
etc
您可以按月对每个ZipCode
和resample
进行汇总,然后按sum
进行一级汇总,这里按Zipcode
的累计总和进行汇总:
df['Date'] = pd.to_datetime(df['Date'])
df = (df.set_index('Date')
.groupby('Zipcode', sort=False)
.resample('MS')['Purchase'].sum()
.groupby(level=0)
.cumsum()
.reset_index(name='Cumulative purchases'))
df['Date'] = df['Date'].dt.strftime('%B %Y')
print (df)
Zipcode Date Cumulative purchases
0 9999 December 2018 2
1 9999 January 2019 2
2 9999 February 2019 2
3 9999 March 2019 3
4 9999 April 2019 4
5 2400 December 2018 2
6 2400 January 2019 3
7 2400 February 2019 4