如何从时间序列数据中删除周末和节假日



谢谢你看我的问题。我试图从外汇市场的时间序列数据中删除周末和假日。我用过pd。bdate_range,但我不确定如何在Primary book中实现它。如果你需要更多的信息,请联系我。

Thanks for the help

Now = today.replace( second = 0, microsecond =0)
st = (today-timedelta(days = 14))
et=today
#Remove weekend
br= pd.bdate_range(st, et)
#Remove holidays
#Only time betweeen 7am to 5pm
PrimaryBook = _get_tsdb_primary_prices("audusd", st,et).fillna(method="ffill")
PrimaryBook = PrimaryBook.dropna(axis=1, how='all')
PrimaryBook = PrimaryBook.dropna()
PrimaryBookB = PrimaryBook.filter(regex=r'(BID|BSIZ)')
PrimaryBookA = PrimaryBook.filter(regex=r'(ASK|ASIZ)')
PrimaryBookB = PrimaryBook.filter(regex=r'(BID|BSIZ)')
PrimaryBookZ= PrimaryBook.filter(regex=r'(ASK|ASIZ)')
PrimaryBookZ

这是结果,但我想删除周末,假期,并有一个时间框架早上7点到下午5点

BEST_ASK1   BEST_ASIZ1  BEST_ASK2   BEST_ASIZ2  BEST_ASK3   BEST_ASIZ3  BEST_ASK4   BEST_ASIZ4  BEST_ASK5   BEST_ASIZ5
Time                                        
2021-07-22 08:41:36.625573856+00:00 0.73725 2000000.0   0.73730 6000000.0   0.73735 4000000.0   0.73740 5000000.0   0.73745 4000000.0
2021-07-22 08:41:36.630647614+00:00 0.73725 2000000.0   0.73730 6000000.0   0.73735 4000000.0   0.73740 5000000.0   0.73745 4000000.0
2021-07-22 08:41:36.635475238+00:00 0.73725 1000000.0   0.73730 6000000.0   0.73735 4000000.0   0.73740 5000000.0   0.73745 4000000.0
2021-07-22 08:41:36.640455282+00:00 0.73725 2000000.0   0.73730 6000000.0   0.73735 4000000.0   0.73740 5000000.0   0.73745 4000000.0
2021-07-22 08:41:36.660516225+00:00 0.73725 2000000.0   0.73730 6000000.0   0.73735 4000000.0   0.73740 5000000.0   0.73745 5000000.0
... ... ... ... ... ... ... ... ... ... ...
2021-08-05 08:41:29.025629378+00:00 0.73990 6000000.0   0.73995 4000000.0   0.74000 5000000.0   0.74005 5000000.0   0.74010 9000000.0
2021-08-05 08:41:29.450549198+00:00 0.73990 6000000.0   0.73995 4000000.0   0.74000 5000000.0   0.74005 5000000.0   0.74010 7000000.0
2021-08-05 08:41:30.346124376+00:00 0.73990 6000000.0   0.73995 4000000.0   0.74000 5000000.0   0.74005 5000000.0   0.74010 7000000.0
2021-08-05 08:41:31.586653810+00:00 0.73990 6000000.0   0.73995 4000000.0   0.74000 5000000.0   0.74005 5000000.0   0.74010 7000000.0
2021-08-05 08:41:31.840526198+00:00 0.73990 6000000.0   0.73995 4000000.0   0.74000 5000000.0   0.74005 5000000.0   0.74010 7000000.0

我真的很感激你的帮助

由于假期与国家和年份有关,因此您需要使用套餐。

建议使用holidays:

import holidays
for day in holidays.UnitedStates(years=2021).items():
print(day)

将为您提供相应年份中所有假日的datetime对象列表:

(datetime.date(2021, 1, 1), "New Year's Day")
(datetime.date(2021, 12, 31), "New Year's Day (Observed)")
(datetime.date(2021, 1, 18), 'Martin Luther King Jr. Day')
(datetime.date(2021, 2, 15), "Washington's Birthday")
...

下一步将把您的日期转换为相同的格式:

import pandas as pd
df = pd.DataFrame([{"id":1, "day":"2021-07-22 08:41:36.625573856+00:00"}, {"id":1, "day":"2021-12-31 08:41:36.625573856+00:00"}])
df.day = pd.to_datetime(df.day)

之后很容易比较这一天是否包含在假期列表中:

df.loc[:,"isholiday"] = df.apply(lambda x: x.day.date() in [d[0] for d in holidays.UnitedStates(years=2021).items()], axis=1)
df
id  day                                 isholiday
0   1   2021-07-22 08:41:36.625573856+00:00 False
1   1   2021-12-31 08:41:36.625573856+00:00 True

周末也是一样通过检查dt是否。Dayofweek属性在[5,6](零索引天数)

我重置了索引,然后使用了dt。Dayofweek <5删除周末

最新更新