想要在一个函数中找到年度和季度计算

这是我昨天在这里发布的问题的延续。答案很神奇，但很快就遇到了问题，因为我也希望使用相同的逻辑进行季度对季度的计算。

数据帧如下，不介意长度，我真的不知道如何隐藏它(如果有人能启发我如何隐藏长df，我将不胜感激…

因为有一些修改，让我再次解释一下这个问题。我想找出每个市场中每个产品在时间段的同比差异(真是太难了！(例如，对于美国市场在时间段QTR且日期为2020-06-01时的产品，其值为100。这仅仅意味着，在2020年第二季度，我们实现了100美元的收入。我希望找到2020年第二季度与2019年第二季相比的增长率，因此增长率简单地为(100-300(/300=-66.6%。当您想要计算同比增长时，时间段MAT(移动年度总数(也是如此。

现在我也想找到季度环比增长，现在请注意，此计算与时间段MAT无关，因此我下面的代码会处理此问题(不确定是否正确处理(。我修改后的函数的输出是有效的，但代价是输出的可读性。因为现在每行的"年度参考"日期与"季度参考"日期不匹配。因为最终我需要使用这个输出来做一些分析。有什么可以改进的吗？

MARKET  PRODUCT TIMEPERIOD  DATE        VALUES
0   USA     A       QTR         2018-06-01  300
1   USA     A       QTR         2019-06-01  300
2   USA     A       QTR         2020-03-01  100
3   USA     A       QTR         2020-06-01  100
4   USA     A       MAT         2018-06-01  2000
5   USA     A       MAT         2019-06-01  2000
6   USA     A       MAT         2020-06-01  1000
7   USA     B       QTR         2018-06-01  100
8   USA     B       QTR         2019-06-01  100
9   USA     B       QTR         2020-03-01  300
10  USA     B       QTR         2020-06-01  200
11  USA     B       MAT         2018-06-01  3000
12  USA     B       MAT         2019-06-01  3000
13  USA     B       MAT         2020-06-01  5000
14  UK      C       QTR         2018-06-01  500
15  UK      C       QTR         2019-06-01  500
16  UK      C       QTR         2020-03-01  200
17  UK      C       QTR         2020-06-01  200
18  UK      C       MAT         2018-06-01  300
19  UK      C       MAT         2019-06-01  300
20  UK      C       MAT         2020-06-01  5000
21  UK      D       QTR         2018-06-01  50
22  UK      D       QTR         2019-06-01  50
23  UK      D       QTR         2020-03-01  200
24  UK      D       QTR         2020-06-01  200
25  UK      D       MAT         2018-06-01  500
26  UK      D       MAT         2019-06-01  500
27  UK      D       MAT         2020-06-01  5000

我下面的代码如下：

import numpy as np
import pandas as pd
from itertools import combinations
def get_annual_growth(grp):
# Get all possible combination of the years from dataset
year_comb_lists = np.sort([sorted(comb) for comb in combinations(grp.Date, 2)])
new_year_comb_lists = [comb_dates for comb_dates in year_comb_lists if comb_dates[0]==comb_dates[1]-relativedelta(months=12)]
quarter_comb_lists = [comb_dates for comb_dates in year_comb_lists if comb_dates[0]==comb_dates[1]-relativedelta(months=3)]
# Get year-combination labels
year_comb_strings = [comb[1] for comb in new_year_comb_lists]
quarter_comb_strings = [comb[1] for comb in quarter_comb_lists]
# Create sub-dataframe with to be concated afterwards by pandas `groupby`
subdf = pd.DataFrame(columns=['Annual_Reference', 'Annual_Growth', "Quarterly_Reference",'Quarterly_Growth'])
for i,years in enumerate(new_year_comb_lists): # for each year combination ...
actual_value, last_value = grp[grp['Date']==years[1]].Values.mean(), grp[grp['Date']==years[0]].Values.mean()
growth = (actual_value - last_value) / last_value # calculate the annual growth
subdf.loc[i, ['Annual_Reference', 'Annual_Growth']] = [year_comb_strings[i], growth] 
for i, quarters in enumerate(quarter_comb_lists):
actual_value, last_value = grp[grp['Date']==quarters[1]].Values.mean(), grp[grp['Date']==quarters[0]].Values.mean()
growth = (actual_value - last_value) / last_value
subdf.loc[i, ["Quarterly_Reference",'Quarterly_Growth']] = [quarter_comb_strings[i], growth] 
return subdf
df_2.groupby(['TIMEPERIOD','MARKET', 'PRODUCT']).apply(get_annual_growth)
df_2= df_2.reset_index()
df_2['Annual_Reference'] = pd.to_datetime(df_2['Annual_Reference'])
df_2['Quarterly_Reference'] = pd.to_datetime(gr_products['Quarterly_Reference'])

对于任何想要复制代码的人来说，它如下所示：

df_list = [['USA', 'A', 'QTR', '2020-06-01', 100], ['USA', 'A', 'MAT', '2020-06-01', 1000],
['USA', 'B', 'QTR', '2020-06-01', 200],  ['USA', 'B', 'MAT', '2020-06-01', 5000], 
['USA', 'A', 'QTR', '2020-03-01', 500], ['USA', 'B', 'QTR', '2020-03-01', 300],        
['USA', 'A', 'QTR', '2019-06-01', 300],  ['USA', 'A', 'MAT', '2019-06-01', 2000],
['USA', 'B', 'QTR', '2019-06-01', 100],  ['USA', 'B', 'MAT', '2019-06-01', 3000],
['USA', 'A', 'QTR', '2018-06-01', 300],  ['USA', 'A', 'MAT', '2018-06-01', 2000],
['USA', 'B', 'QTR', '2018-06-01', 100],  ['USA', 'B', 'MAT', '2018-06-01', 3000],
['UK', 'C', 'QTR', '2020-06-01', 200],  ['UK', 'C', 'MAT', '2020-06-01', 5000], 
['UK', 'C', 'QTR', '2020-03-01', 100],  ['UK', 'D', 'QTR', '2020-03-01', 50], 
['UK', 'D', 'QTR', '2020-06-01', 200],    ['UK', 'D', 'MAT', '2020-06-01', 5000],
['UK', 'C', 'QTR', '2019-06-01', 500],  ['UK', 'C', 'MAT', '2019-06-01', 300], 
['UK', 'D', 'QTR', '2019-06-01', 50],    ['UK', 'D', 'MAT', '2019-06-01', 500],
['UK', 'C', 'QTR', '2018-06-01', 500],  ['UK', 'C', 'MAT', '2018-06-01', 300], 
['UK', 'D', 'QTR', '2018-06-01', 50],    ['UK', 'D', 'MAT', '2018-06-01', 500]]
column_names = ['MARKET', 'PRODUCT', 'TIMEPERIOD', 'Date', 'Values']
df_2 = pd.DataFrame(df_list, columns = column_names)
df_2['Date']= pd.to_datetime(df_2['Date'])
df_2 = df_2.sort_values(by=['PRODUCT', 'TIMEPERIOD', 'Date']).reset_index(drop=True)

你可以试试这个：

df_2['month'] = df_2['Date'].dt.month
df_2['change'] = df_2.groupby(['MARKET','PRODUCT','TIMEPERIOD','month']).Values.pct_change()

如果你没有任何一年的数据丢失，它应该是有效的。

编辑：以上是按年计算的，按季度计算的，不要按月分组。同样，如果没有任何四分之一的数据丢失，它也应该有效。

我发现自己有一种方法可以在相同的Date级别上进行输出，如下所示，虽然不是最优雅的，但目前有效。

def get_annual_growth(grp):
# The all possible combination from the years in dataset
year_comb_lists = np.sort([sorted(comb) for comb in combinations(subset_group.Date, 2)])
# Remove those combinations in which difference is greather than 1 (for example, 2018-2020)
new_year_comb_lists = [comb_dates for comb_dates in year_comb_lists 
if comb_dates[0]==comb_dates[1]-relativedelta(months=12)]
new_year_comb_lists=sorted(new_year_comb_lists,key=lambda x: x[1]) 
quarter_comb_lists = [comb_dates for comb_dates in year_comb_lists 
if comb_dates[0]==comb_dates[1]-relativedelta(months=3)
and comb_dates[1].year != 2018]
quarter_comb_lists=sorted(quarter_comb_lists, key=lambda x: x[1])
# Get combination labels
year_comb_strings = [comb[1] for comb in new_year_comb_lists]
quarter_comb_strings = [comb[1] for comb in quarter_comb_lists]

# Creat sub-dataframe with to be concated afterwards by pandas `groupby`
subdf = pd.DataFrame(columns=['Annual_Reference', 'Annual_Growth', "Quarterly_Reference",'Quarterly_Growth'])
for i , (years, quarters) in enumerate(zip(new_year_comb_lists, quarter_comb_lists)): # for each year combination ...
try:
curr_year_val, prev_year_val = grp[grp['Date']==years[1]].Values.mean(), grp[grp['Date']==years[0]].Values.mean()
curr_qtr_val, prev_qtr_val = grp[grp['Date']==quarters[1]].Values.mean(), grp[grp['Date']==quarters[0]].Values.mean()
year_gr = (curr_year_val - prev_year_val) / prev_year_val # calculate the annual growth
qtr_gr = (curr_qtr_val - prev_qtr_val) / prev_qtr_val
subdf.loc[i, ['Annual_Reference', 'Annual_Growth',
'Quarterly_Reference','Quarterly_Growth']] = [year_comb_strings[i], year_gr, quarter_comb_strings[i], qtr_gr] 

except ZeroDivisionError:
year_gr = 0
qtr_gr = 0
return subdf

相关内容

最新更新

热门标签：