我试图按3小时的间隔分组时间索引的数据帧。它的采样频率为1.5秒。我希望下面的代码返回一个长度为4323的组。
import pandas as pd
time_grouper = pd.TimeGrouper("3H");
dataframe.groupby(time_grouper).count()
输出:2013-02-23 06:00:00 1733
2013-02-23 09:00:00 1149
Freq: 3H, Name: roll, dtype: int64
如果我将时间分组频率更改为1000秒,我得到:
2013-02-23 08:03:20 133
2013-02-23 08:20:00 667
2013-02-23 08:36:40 666
2013-02-23 08:53:20 667
2013-02-23 09:10:00 667
2013-02-23 09:26:40 82
Freq: 1000S, Name: roll, dtype: int64
编辑从评论中我了解到重新采样从00点00分开始,这就解释了为什么箱子看起来不均匀。如何在指标覆盖的时间范围开始时重新采样?
这是使用pd.cut()
手动构建分类组的一种可能的解决方案。
import pandas as pd
import datetime as dt
# simulate some artificial data
# ==================================================
df = pd.DataFrame(np.random.randn(4500), columns=['col'], index=pd.date_range(dt.datetime.now(), periods=4500, freq=pd.Timedelta(1.5, 's')))
col
2015-07-15 11:41:05.987156 -0.1191
2015-07-15 11:41:07.487156 -0.4531
2015-07-15 11:41:08.987156 1.2682
2015-07-15 11:41:10.487156 -1.3194
2015-07-15 11:41:11.987156 0.2690
2015-07-15 11:41:13.487156 0.3139
2015-07-15 11:41:14.987156 1.3467
2015-07-15 11:41:16.487156 -0.0090
2015-07-15 11:41:17.987156 -1.4792
2015-07-15 11:41:19.487156 -0.6973
... ...
2015-07-15 13:33:20.987156 -0.6072
2015-07-15 13:33:22.487156 0.2621
2015-07-15 13:33:23.987156 -1.1274
2015-07-15 13:33:25.487156 0.9305
2015-07-15 13:33:26.987156 0.4124
2015-07-15 13:33:28.487156 -0.8061
2015-07-15 13:33:29.987156 -0.0065
2015-07-15 13:33:31.487156 -1.3291
2015-07-15 13:33:32.987156 1.1309
2015-07-15 13:33:34.487156 -0.6444
[4500 rows x 1 columns]
# processing using pd.cut
# ==================================================
ts_rng = pd.date_range(df.index[0], df.index[-1], freq='3H')
# string format for labels
ts_rng_iso = [x.isoformat() for x in ts_rng]
# groupby the categorical variables
df.groupby(pd.cut(df.index, bins=ts_rng, labels=ts_rng_iso[:-1], right=True, include_lowest=True)).count()
col
2015-07-15T11:41:05.987156 4500