我有一个数据集,它有以下内容:
Customer_ID| Region| Sales_Date| Sales_Amount
183984 | 1 | 01-04-23 | 1232.44
183991 | 2 | 01-04-23 | 23.41
128943 | 1 | 01-04-23 | 7562.22
目标:了解各地区当前4周与前4周的销售增长情况。
尝试:
我创建了动态时间变量,按时间拆分数据集,然后按区域聚合。然后手动计算增长率。我想知道如何简化它。
current_date = datetime(2023, 4, 8)
current_4_wks_start = current_date
current_4_wks_end = current_date - timedelta(weeks=4)
past_4_wks_start = current_4_wks_end - timedelta(weeks=1)
past_4_wks_end = past_4_wks_start - timedelta(weeks=4)
sales_current_weeks = df_sales_ytd[(df_sales_ytd['Sales_Date'] >= current_4_wks_start) & (df_sales_ytd['Sales_Date'] <= current_4_wks_end)]
sales_current_weeks = sales_current_weeks.groupby(['Region']).agg({'Total_Sales':'sum'}).reset_index().rename(columns = {'Total_Sales': 'current_4_nrx'})
sales_previous_weeks = df_sales_ytd[(df_sales_ytd['Sales_Date'] >= current_4_wks_start) & (df_sales_ytd['Sales_Date'] <= current_4_wks_end)]
sales_previous_weeks = sales_previous_weeks .groupby(['Region']).agg({'Total_Sales':'sum'}).reset_index().rename(columns = {'Total_Sales': 'previous_4_nrx'})
df_4_growth = pd.merge(sales_current_weeks,sales_previous_weeks , on = 'Region', how='inner')
df_4_growth['growth_4'] = (df_4_growth['previous_4_nrx']-df_4_growth['current_4_nrx'])/(df_4_growth['previous_4_nrx']
可以通过创建一个函数来过滤数据并根据给定的日期范围执行聚合,从而简化此过程。
def sales_by_region(df, start_date, end_date):
sales = df[(df['Sales_Date'] >= start_date) & (df['Sales_Date'] <= end_date)]
sales = sales.groupby(['Region']).agg({'Sales_Amount': 'sum'}).reset_index()
return sales
上面的函数sales_by_region
以DataFrame、开始日期和结束日期作为输入,并返回该日期范围内按地区划分的总销售额。此函数可用于计算当前和前4周的销售额。然后,可以计算增长并将其添加到生成的数据框架中。