这是我昨天在这里发布的问题的延续。答案很神奇,但很快就遇到了问题,因为我也希望使用相同的逻辑进行季度对季度的计算。
数据帧如下,不介意长度,我真的不知道如何隐藏它(如果有人能启发我如何隐藏长df,我将不胜感激…
因为有一些修改,让我再次解释一下这个问题。我想找出每个市场中每个产品在时间段的同比差异(真是太难了!(例如,对于美国市场在时间段QTR且日期为2020-06-01时的产品,其值为100。这仅仅意味着,在2020年第二季度,我们实现了100美元的收入。我希望找到2020年第二季度与2019年第二季相比的增长率,因此增长率简单地为(100-300(/300=-66.6%。当您想要计算同比增长时,时间段MAT(移动年度总数(也是如此。
现在我也想找到季度环比增长,现在请注意,此计算与时间段MAT无关,因此我下面的代码会处理此问题(不确定是否正确处理(。我修改后的函数的输出是有效的,但代价是输出的可读性。因为现在每行的"年度参考"日期与"季度参考"日期不匹配。因为最终我需要使用这个输出来做一些分析。有什么可以改进的吗?
MARKET PRODUCT TIMEPERIOD DATE VALUES
0 USA A QTR 2018-06-01 300
1 USA A QTR 2019-06-01 300
2 USA A QTR 2020-03-01 100
3 USA A QTR 2020-06-01 100
4 USA A MAT 2018-06-01 2000
5 USA A MAT 2019-06-01 2000
6 USA A MAT 2020-06-01 1000
7 USA B QTR 2018-06-01 100
8 USA B QTR 2019-06-01 100
9 USA B QTR 2020-03-01 300
10 USA B QTR 2020-06-01 200
11 USA B MAT 2018-06-01 3000
12 USA B MAT 2019-06-01 3000
13 USA B MAT 2020-06-01 5000
14 UK C QTR 2018-06-01 500
15 UK C QTR 2019-06-01 500
16 UK C QTR 2020-03-01 200
17 UK C QTR 2020-06-01 200
18 UK C MAT 2018-06-01 300
19 UK C MAT 2019-06-01 300
20 UK C MAT 2020-06-01 5000
21 UK D QTR 2018-06-01 50
22 UK D QTR 2019-06-01 50
23 UK D QTR 2020-03-01 200
24 UK D QTR 2020-06-01 200
25 UK D MAT 2018-06-01 500
26 UK D MAT 2019-06-01 500
27 UK D MAT 2020-06-01 5000
我下面的代码如下:
import numpy as np
import pandas as pd
from itertools import combinations
def get_annual_growth(grp):
# Get all possible combination of the years from dataset
year_comb_lists = np.sort([sorted(comb) for comb in combinations(grp.Date, 2)])
new_year_comb_lists = [comb_dates for comb_dates in year_comb_lists if comb_dates[0]==comb_dates[1]-relativedelta(months=12)]
quarter_comb_lists = [comb_dates for comb_dates in year_comb_lists if comb_dates[0]==comb_dates[1]-relativedelta(months=3)]
# Get year-combination labels
year_comb_strings = [comb[1] for comb in new_year_comb_lists]
quarter_comb_strings = [comb[1] for comb in quarter_comb_lists]
# Create sub-dataframe with to be concated afterwards by pandas `groupby`
subdf = pd.DataFrame(columns=['Annual_Reference', 'Annual_Growth', "Quarterly_Reference",'Quarterly_Growth'])
for i,years in enumerate(new_year_comb_lists): # for each year combination ...
actual_value, last_value = grp[grp['Date']==years[1]].Values.mean(), grp[grp['Date']==years[0]].Values.mean()
growth = (actual_value - last_value) / last_value # calculate the annual growth
subdf.loc[i, ['Annual_Reference', 'Annual_Growth']] = [year_comb_strings[i], growth]
for i, quarters in enumerate(quarter_comb_lists):
actual_value, last_value = grp[grp['Date']==quarters[1]].Values.mean(), grp[grp['Date']==quarters[0]].Values.mean()
growth = (actual_value - last_value) / last_value
subdf.loc[i, ["Quarterly_Reference",'Quarterly_Growth']] = [quarter_comb_strings[i], growth]
return subdf
df_2.groupby(['TIMEPERIOD','MARKET', 'PRODUCT']).apply(get_annual_growth)
df_2= df_2.reset_index()
df_2['Annual_Reference'] = pd.to_datetime(df_2['Annual_Reference'])
df_2['Quarterly_Reference'] = pd.to_datetime(gr_products['Quarterly_Reference'])
对于任何想要复制代码的人来说,它如下所示:
df_list = [['USA', 'A', 'QTR', '2020-06-01', 100], ['USA', 'A', 'MAT', '2020-06-01', 1000],
['USA', 'B', 'QTR', '2020-06-01', 200], ['USA', 'B', 'MAT', '2020-06-01', 5000],
['USA', 'A', 'QTR', '2020-03-01', 500], ['USA', 'B', 'QTR', '2020-03-01', 300],
['USA', 'A', 'QTR', '2019-06-01', 300], ['USA', 'A', 'MAT', '2019-06-01', 2000],
['USA', 'B', 'QTR', '2019-06-01', 100], ['USA', 'B', 'MAT', '2019-06-01', 3000],
['USA', 'A', 'QTR', '2018-06-01', 300], ['USA', 'A', 'MAT', '2018-06-01', 2000],
['USA', 'B', 'QTR', '2018-06-01', 100], ['USA', 'B', 'MAT', '2018-06-01', 3000],
['UK', 'C', 'QTR', '2020-06-01', 200], ['UK', 'C', 'MAT', '2020-06-01', 5000],
['UK', 'C', 'QTR', '2020-03-01', 100], ['UK', 'D', 'QTR', '2020-03-01', 50],
['UK', 'D', 'QTR', '2020-06-01', 200], ['UK', 'D', 'MAT', '2020-06-01', 5000],
['UK', 'C', 'QTR', '2019-06-01', 500], ['UK', 'C', 'MAT', '2019-06-01', 300],
['UK', 'D', 'QTR', '2019-06-01', 50], ['UK', 'D', 'MAT', '2019-06-01', 500],
['UK', 'C', 'QTR', '2018-06-01', 500], ['UK', 'C', 'MAT', '2018-06-01', 300],
['UK', 'D', 'QTR', '2018-06-01', 50], ['UK', 'D', 'MAT', '2018-06-01', 500]]
column_names = ['MARKET', 'PRODUCT', 'TIMEPERIOD', 'Date', 'Values']
df_2 = pd.DataFrame(df_list, columns = column_names)
df_2['Date']= pd.to_datetime(df_2['Date'])
df_2 = df_2.sort_values(by=['PRODUCT', 'TIMEPERIOD', 'Date']).reset_index(drop=True)
你可以试试这个:
df_2['month'] = df_2['Date'].dt.month
df_2['change'] = df_2.groupby(['MARKET','PRODUCT','TIMEPERIOD','month']).Values.pct_change()
如果你没有任何一年的数据丢失,它应该是有效的。
编辑:以上是按年计算的,按季度计算的,不要按月分组。同样,如果没有任何四分之一的数据丢失,它也应该有效。
我发现自己有一种方法可以在相同的Date级别上进行输出,如下所示,虽然不是最优雅的,但目前有效。
def get_annual_growth(grp):
# The all possible combination from the years in dataset
year_comb_lists = np.sort([sorted(comb) for comb in combinations(subset_group.Date, 2)])
# Remove those combinations in which difference is greather than 1 (for example, 2018-2020)
new_year_comb_lists = [comb_dates for comb_dates in year_comb_lists
if comb_dates[0]==comb_dates[1]-relativedelta(months=12)]
new_year_comb_lists=sorted(new_year_comb_lists,key=lambda x: x[1])
quarter_comb_lists = [comb_dates for comb_dates in year_comb_lists
if comb_dates[0]==comb_dates[1]-relativedelta(months=3)
and comb_dates[1].year != 2018]
quarter_comb_lists=sorted(quarter_comb_lists, key=lambda x: x[1])
# Get combination labels
year_comb_strings = [comb[1] for comb in new_year_comb_lists]
quarter_comb_strings = [comb[1] for comb in quarter_comb_lists]
# Creat sub-dataframe with to be concated afterwards by pandas `groupby`
subdf = pd.DataFrame(columns=['Annual_Reference', 'Annual_Growth', "Quarterly_Reference",'Quarterly_Growth'])
for i , (years, quarters) in enumerate(zip(new_year_comb_lists, quarter_comb_lists)): # for each year combination ...
try:
curr_year_val, prev_year_val = grp[grp['Date']==years[1]].Values.mean(), grp[grp['Date']==years[0]].Values.mean()
curr_qtr_val, prev_qtr_val = grp[grp['Date']==quarters[1]].Values.mean(), grp[grp['Date']==quarters[0]].Values.mean()
year_gr = (curr_year_val - prev_year_val) / prev_year_val # calculate the annual growth
qtr_gr = (curr_qtr_val - prev_qtr_val) / prev_qtr_val
subdf.loc[i, ['Annual_Reference', 'Annual_Growth',
'Quarterly_Reference','Quarterly_Growth']] = [year_comb_strings[i], year_gr, quarter_comb_strings[i], qtr_gr]
except ZeroDivisionError:
year_gr = 0
qtr_gr = 0
return subdf