pd.Date_range如何排除几个小时



嗨,我有一个关于使用pd.date_range()的问题。我正在做ARIMA模型,一步我需要预测一些价格。例如,在时间2021-01-04 11:20。我想用freq = '5Min'生成下一个4日期索引,因此我编写以下代码

pd.date_range(start = '2021-01-04 11:20', periods = 5, freq = '5Min')

得到

['2021-01-04 11:20', '2021-01-04 11:25', '2021-01-04 11:30', '2021-01-04 11:35', '2021-01-04 11:40']

但是市场在下午开盘。所以在11:30之后,市场将在'2021-01-04 15:00'开盘,所以这个系列应该是。

['2021-01-04 11:20', '2021-01-04 11:25', '2021-01-04 15:00', '2021-01-04 15:05', '2021-01-04 15:10'].

那么如何自定义频率,以便我可以在一天中排除一些"小时范围"?

请谢谢!我真的很感激!

使用DatetimeIndex.indexer_between_time作为位置,然后在boolean indexing中通过np.isin过滤掉这些值:

r = pd.date_range(start = '2021-01-04 00:00', periods = 100, freq = '30Min')
ind = (r.indexer_between_time('11:30','13:30').tolist() +
r.indexer_between_time('15:00','21:00').tolist() +
r.indexer_between_time('23:00','09:00').tolist())
# print (ind)
out = r[np.isin(np.arange(len(r)), ind, invert=True)]
print (out)
DatetimeIndex(['2021-01-04 09:30:00', '2021-01-04 10:00:00',
'2021-01-04 10:30:00', '2021-01-04 11:00:00',
'2021-01-04 14:00:00', '2021-01-04 14:30:00',
'2021-01-04 21:30:00', '2021-01-04 22:00:00',
'2021-01-04 22:30:00', '2021-01-05 09:30:00',
'2021-01-05 10:00:00', '2021-01-05 10:30:00',
'2021-01-05 11:00:00', '2021-01-05 14:00:00',
'2021-01-05 14:30:00', '2021-01-05 21:30:00',
'2021-01-05 22:00:00', '2021-01-05 22:30:00'],
dtype='datetime64[ns]', freq=None)

另一个想法是使用遮罩:

from datetime import time
r = pd.date_range(start = '2021-01-04 00:00', periods = 100, freq = '30Min')
m = ((r.time > time(hour=9, minute=0)) & (r.time < time(hour=11, minute=30)) |
(r.time > time(hour=13, minute=30)) & (r.time < time(hour=15, minute=0)) |
(r.time > time(hour=21, minute=0)) & (r.time < time(hour=23, minute=0)))

print (m)
out = r[m]
print (out)
DatetimeIndex(['2021-01-04 09:30:00', '2021-01-04 10:00:00',
'2021-01-04 10:30:00', '2021-01-04 11:00:00',
'2021-01-04 14:00:00', '2021-01-04 14:30:00',
'2021-01-04 21:30:00', '2021-01-04 22:00:00',
'2021-01-04 22:30:00', '2021-01-05 09:30:00',
'2021-01-05 10:00:00', '2021-01-05 10:30:00',
'2021-01-05 11:00:00', '2021-01-05 14:00:00',
'2021-01-05 14:30:00', '2021-01-05 21:30:00',
'2021-01-05 22:00:00', '2021-01-05 22:30:00'],
dtype='datetime64[ns]', freq=None)

numpy.r_用于连接索引并通过它们进行过滤的下一个选择:

ind1 = (np.r_[r.indexer_between_time('9:00','11:30', include_start=False, include_end=False),
r.indexer_between_time('13:30','15:00', include_start=False, include_end=False),
r.indexer_between_time('21:00','23:00', include_start=False, include_end=False)])
out = r[ind1]
print (out)
DatetimeIndex(['2021-01-04 09:30:00', '2021-01-04 10:00:00',
'2021-01-04 10:30:00', '2021-01-04 11:00:00',
'2021-01-05 09:30:00', '2021-01-05 10:00:00',
'2021-01-05 10:30:00', '2021-01-05 11:00:00',
'2021-01-04 14:00:00', '2021-01-04 14:30:00',
'2021-01-05 14:00:00', '2021-01-05 14:30:00',
'2021-01-04 21:30:00', '2021-01-04 22:00:00',
'2021-01-04 22:30:00', '2021-01-05 21:30:00',
'2021-01-05 22:00:00', '2021-01-05 22:30:00'],
dtype='datetime64[ns]', freq=None)

我不知道您面临的其他时间限制,但是您可以通过条件和列表理解来调整它吗?我不相信pd.date_range有任何默认参数可以做你所要求的。

# setup
dt_range = pd.date_range(start = '2021-01-04 11:20', periods = 5, freq = '5Min')
# time condition
market_open = "11:30"
# list comprehension
dt_range = [time + pd.DateOffset(hours=3, minutes=30) if time.strftime('%H:%M') >= market_open else time for time in dt_range]
# convert back to panda time series
dt_range = pd.to_datetime(dt_range)
print(dt_range)

输出:

DatetimeIndex(['2021-01-04 11:20:00', '2021-01-04 11:25:00',
'2021-01-04 15:00:00', '2021-01-04 15:05:00',
'2021-01-04 15:10:00'],
dtype='datetime64[ns]', freq=None)

相关内容

  • 没有找到相关文章

最新更新