我们需要返回连续n个工作日的日期列表(周五至周一为1个工作日(。其中值不改变。不要假定日期列包含每个日期。数据帧结构如下所示
date Value
2022-07-19 44.43000000
2022-07-20 44.43000000
2022-07-21 44.43000000
2022-07-22 44.43000000
2022-07-25 44.43000000
... ...
2022-09-02 86.40000000
2022-09-06 85.13000000
2022-09-07 86.86000000
2022-09-08 88.44000000
2022-09-09 89.44000000
如果我们假设n是5。我们需要返回5个连续日期的列表。对于上面的例子,答案是
[2022-07-22,2022-07-20,2022-07-21,2022-07-22,2022-07-25]
我尝试使用下面的代码来获取数据框中的连续日期,但无法获取连续的工作日。
for k, v in px_dirty.groupby((px_dirty['value'].shift() != px_dirty['value']).cumsum()):
if len(v) == 5:
print(f'[group {k}]')
print(v)
我不知道如何获得连续的工作日。
使用date offset
:
from pandas.tseries.offsets import BDay
df['date'] = pd.to_datetime(df['date'])
# identify breaks in successive values
m1 = df['Value'].ne(df['Value'].shift())
# identify breaks in business days
m2 = df['date'].ne(df['date'].shift().add(BDay()))
# group by either break
for k,g in df.groupby((m1|m2).cumsum()):
if len(g) == 5:
print(f'[group {k}]')
print(g)
输出:
[group 1]
date Value
0 2022-07-19 44.43
1 2022-07-20 44.43
2 2022-07-21 44.43
3 2022-07-22 44.43
4 2022-07-25 44.43
中间体:
date Value m1 m2 m1|m2 group len
0 2022-07-19 44.43 True True True 1 5
1 2022-07-20 44.43 False False False 1 5
2 2022-07-21 44.43 False False False 1 5
3 2022-07-22 44.43 False False False 1 5
4 2022-07-25 44.43 False False False 1 5
5 2022-09-02 86.40 True True True 2 1
6 2022-09-06 85.13 True True True 3 1
7 2022-09-07 86.86 True False True 4 1
8 2022-09-08 88.44 True False True 5 1
9 2022-09-09 89.44 True False True 6 1
首先,创建一个包含工作日的字段,该字段是(工作日<6(和(年的周数(的乘积。然后对每组的长度至少为5个唯一元素的列表进行分组和形成
df = pd.DataFrame(pd.date_range('2022-01-01', '2022-03-01'))
df['BD'] = df[0].dt.weekday.lt(5).astype(int) * df[0].dt.isocalendar().week
df1 = df.groupby('BD').apply(lambda x: x[0].to_list() if len(set(x[0])) > 4 else None).dropna()
print(df1)
打印:
BD
1 [2022-01-03 00:00:00, 2022-01-04 00:00:00, 202...
2 [2022-01-10 00:00:00, 2022-01-11 00:00:00, 202...
3 [2022-01-17 00:00:00, 2022-01-18 00:00:00, 202...
4 [2022-01-24 00:00:00, 2022-01-25 00:00:00, 202...
5 [2022-01-31 00:00:00, 2022-02-01 00:00:00, 202...
6 [2022-02-07 00:00:00, 2022-02-08 00:00:00, 202...
7 [2022-02-14 00:00:00, 2022-02-15 00:00:00, 202...
8 [2022-02-21 00:00:00, 2022-02-22 00:00:00, 202...
dtype: object