如何在Pandas中从年份和周数定义5-4-4周的周期列



我的公司使用5-4-4日历进行报告。每个月(又名周期(长4周,除了第一个月长5周。

Pandas似乎对自定义日历周期有很好的支持,并且也找到了这个文档。

我正在使用以下代码从周和年创建周期列

dff['Period'] = pd.to_datetime(dff["Week ID"].astype("str") + "-0", format="%Y%U-%w").dt.month
dff['Period'] ='Period ' + dff['Period'].astype(str).str.zfill(2)

然而,我的公司在这段时间内使用5-4-4格式,即

  • 前5周[Wk01至Wk05]应视为第1期
  • 接下来的4周[Wk06至Wk09]应视为第2阶段
  • 接下来的4周[Wk10至Wk13]应视为第3期

再次

  • 接下来的5周[Wk14至Wk18]应视为第4期
  • 接下来的4周[Wk19至Wk22]应视为第5阶段
  • 接下来的4周[Wk23至Wk26]应视为第6期

依此类推。。。。。。。

电流输出

Year    Period      Week    Week ID Site
2020    Period 01   Wk01    202001  ABC
2020    Period 01   Wk02    202002  ABC
2020    Period 01   Wk03    202003  ABC
2020    Period 01   Wk04    202004  ABC
2020    Period 02   Wk05    202005  ABC
2020    Period 02   Wk06    202006  ABC
2020    Period 02   Wk07    202007  ABC
2020    Period 02   Wk08    202008  ABC
2020    Period 03   Wk09    202009  ABC
2020    Period 03   Wk10    202010  ABC
2020    Period 03   Wk11    202011  ABC
2020    Period 03   Wk12    202012  ABC
2020    Period 03   Wk13    202013  ABC
2020    Period 04   Wk14    202014  ABC
2020    Period 04   Wk15    202015  ABC

预期输出

Year    Period      Week    Week ID Site
2020    Period 01   Wk01    202001  ABC
2020    Period 01   Wk02    202002  ABC
2020    Period 01   Wk03    202003  ABC
2020    Period 01   Wk04    202004  ABC
2020    Period 01   Wk05    202005  ABC
2020    Period 02   Wk06    202006  ABC
2020    Period 02   Wk07    202007  ABC
2020    Period 02   Wk08    202008  ABC
2020    Period 02   Wk09    202009  ABC
2020    Period 03   Wk10    202010  ABC
2020    Period 03   Wk11    202011  ABC
2020    Period 03   Wk12    202012  ABC
2020    Period 03   Wk13    202013  ABC
2020    Period 04   Wk14    202014  ABC
2020    Period 04   Wk15    202015  ABC

您可以使用一些数学来计算映射器:

# define week pattern
pat = np.array([5,4,4,4])
# weeks
weeks = np.arange(52)+1
# get divider/remainder of full period (=divmod 17 here)
d,r = np.divmod((weeks-1), pat.sum())
# determine periods
a = (r==pat.cumsum()[:,None]).cumsum(axis=1).sum(axis=0)+d+1
# define mapper
week_to_period = dict(enumerate(a, start=1))
# {1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 2, 7: 2, 8: 2, 9: 2, 10: 3, 11: 3,...}
# map week numbers to custom periods
s = (pd.to_datetime(dff["Week ID"].astype("str") + "-0", format="%Y%U-%w")
.dt.isocalendar().week()
)
dff['Period'] = 'Period '+s.map(week_to_period).astype(str).str.zfill(2)

输出:

Year     Period  Week  Week ID Site
0   2020  Period 01  Wk01   202001  ABC
1   2020  Period 01  Wk02   202002  ABC
2   2020  Period 01  Wk03   202003  ABC
3   2020  Period 01  Wk04   202004  ABC
4   2020  Period 01  Wk05   202005  ABC
5   2020  Period 02  Wk06   202006  ABC
6   2020  Period 02  Wk07   202007  ABC
7   2020  Period 02  Wk08   202008  ABC
8   2020  Period 02  Wk09   202009  ABC
9   2020  Period 03  Wk10   202010  ABC
10  2020  Period 03  Wk11   202011  ABC
11  2020  Period 03  Wk12   202012  ABC
12  2020  Period 03  Wk13   202013  ABC
13  2020  Period 04  Wk14   202014  ABC
14  2020  Period 04  Wk15   202015  ABC
import numpy as np
import pandas as pd
# Define range of months
x = list(range(1,13))
# Define week pattern
y = [5, 4, 4]*4
period1 = list(np.repeat(x, y))
wk = ['Wk'+str(i) for i in range(1,53)]
di_wk = dict(zip(wk, period1))
'''{'Wk1': 1, 'Wk2': 1, 'Wk3': 1, 'Wk4': 1, 'Wk5': 1, 'Wk6': 2, 'Wk7': 2, 'Wk8': 2,
'Wk9': 2, 'Wk10': 3, 'Wk11': 3, 'Wk12': 3, 'Wk13': 3, 'Wk14': 4, 'Wk15': 4,......}'''
dff['Period'] = 'Period ' + dff['Week'].map(di_wk).astype(str).str.zfill(2)

相关内容

  • 没有找到相关文章