如何组合时间范围和布尔索引

我有一个带有日期时间索引的DataFrame：

tbl.iloc[:,:2].head(5)
date_time               var1    var2    
2011-01-01 00:05:00     97.97   1009.28
2011-01-01 00:10:00     97.53   1009.53
2011-01-01 00:15:00     97.38   1009.15
2011-01-01 00:20:00     97.23   1009.03
2011-01-01 00:25:00     97.01   1009.03

现在我想选择周一至周五下午6点至7点，周六早上6点至下午5点，周日早上8点至5点。

我可以用在一个时间范围内做到这一点

import datetime
selection = tbl.ix[datetime.time(6):datetime.time(19)]

添加工作日条件，即结合时间范围和布尔索引显然不能像我尝试的那样工作：

tbl['weekday'] = tbl.index.weekday
test = tbl[(tbl.ix[datetime.time(6):datetime.time(19)]) & (tbl['weekday'] == 4)]

=>类型错误：无法将类型"Timestamp"与类型"str"进行比较

test = tbl[(tbl.index>datetime.time(6)) (tbl.index>datetime.time(19)) & (tbl['weekday'] == 4)]

=>TypeError：键入对象08:00:00

tbl['date'] = tbl.index
test = tbl[(tbl['date']>datetime.time(8)) & (tbl['weekday'] == 4)]

=>ValueError:无法从参数构造时间戳

我的代码出了什么问题？

第一位过滤数据帧，第二位返回布尔值：尝试

test = (tbl[(tbl.ix[datetime.time(6):datetime.time(19)]).ix[tbl.weekday == 4)]

基本上应用第一个过滤器，然后在其顶部应用第二个过滤器。等效于布尔值和。

我建议您使用Ipython或其笔记本之类的工具来检查函数的中间结果，以确保它们仍然如预期的那样。如果你还没有使用panda语法的经验，那么很难直接写出这些表达式。

我现在找到了一个解决方案：

criterion1 = tbl.index.map(lambda i: i.hour >= 8)
criterion2 = tbl.index.map(lambda i: i.hour < 19)
criterion3 = (tbl['weekday'] == 4) 
tbl[criterion1 & criterion2 & criterion3]

还有什么更优雅的吗？

使用布尔索引的更优雅的（@tobip）解决方案。

# create index that gives array of row indices in the given time range
idx = tbl.index.indexer_between_time("8:00", "19:00", include_end=False)
# convert index array to boolean index
criterion1 = np.zeros(tbl.shape[0], dtype=bool)
criterion1[idx] = 1
# one more boolean index 
criterion2 = (tbl['weekday'] == 4) 
# combine boolean indices using logical and
tbl[criterion1 & criterion2]

相关内容

最新更新

热门标签：