相对30天反转的汇总数据



我得到了一个包含以下列的数据框:product, product_launch_date, date, revenue。如何按30天的间隔计算产品的总收益?

对于每个不同的产品,product_launch_date在行中重复。

df['days_since_launch'] = (df['date'] -  df['product_launch_date']).dt.days

我想以30天的间隔为每个产品存储'days_since_launch'。一个愚蠢的方法是手动绘制:

con1 = df['days_since_launch']<= 30
con2 = df['days_since_launch']> 30 |df['days_since_launch'] <= 60
...
df['date_bucket'] = 0
df.loc[con1, 'date_bucket'] = 1
df.loc[con2, 'date_bucket'] = 2
...

最后,我可以使用groupby按date_bucket进行聚合:

df.groupby(['product', 'date_bucket'])['revenue'].agg('sum').reset_index()

问题是如何以一种智能的方式生成date_bucket ?谢谢。

看起来你可以使用pd.cut:

df['days_since_launch'] = (df['date'] -  df['product_launch_date']).dt.days
bins = np.arange(df['days_since_launch'].min(), df['days_since_launch'].max()+1, 30)
df['date_bucket'] = pd.cut(df['days_since_launch'], bins=bins, labels=False)

最新更新