熊猫 - 获得更高频率的最后n个项目的滚动最大值(相对于未关闭的固定时间)



>我有数据:

index = pd.DatetimeIndex(['2000-01-01 08:00:00', '2000-01-01 08:00:01',
'2000-01-01 08:00:03', '2000-01-01 08:00:04',
'2000-01-01 08:00:05', '2000-01-01 08:00:06',
'2000-01-01 08:00:09', '2000-01-01 08:00:10',
'2000-01-01 08:00:11', '2000-01-01 08:00:12',
'2000-01-01 08:00:15', '2000-01-01 08:00:16',
'2000-01-01 08:00:18', '2000-01-01 08:00:19',
'2000-01-01 08:00:20', '2000-01-01 08:00:22',
'2000-01-01 08:00:23', '2000-01-01 08:00:24',
'2000-01-01 08:00:25', '2000-01-01 08:00:26',
'2000-01-01 08:00:27', '2000-01-01 08:00:28'])
df = pd.DataFrame({"A": [4,3,8,15,2,3,14,17,15,20,13,2,5,15,10,10,13,20,1,3,10,11]},
index=index)
df

我发现我可以通过以下方式获取特定时间间隔的max

df2 = df.groupby(pd.Grouper(freq='5s', base=2)).rolling('5s', min_periods=1).max()
df2

我想获得每条记录的最后 2 组 5s 时间间隔相对于未完成的固定实际时间间隔的滚动最大值。

我发现两个固定 5s 间隔的滚动最大值可以通过以下方式获得:

df3 = df.resample('5s', base=2).max().rolling(2).max()

但例如在记录中2000-01-01 08:00:22应该是结果15而不是20。我找不到如何将这两种方法结合起来。

结果应为

A  last-2x-5s-intervals-max
2000-01-01 07:59:57 2000-01-01 08:00:00   4                       NaN (not enough data)
2000-01-01 08:00:01   3                       NaN (not enough data)
2000-01-01 08:00:02 2000-01-01 08:00:03   8                       8.0 (max from previous 5s to actual record/to actual unfinished 5s time interval)
2000-01-01 08:00:04  15                      15.0 (max from 2000-01-01 07:59:57 to 2000-01-01 08:00:04)
2000-01-01 08:00:05   2                      15.0 (max from 2000-01-01 07:59:57 to 2000-01-01 08:00:05)
2000-01-01 08:00:06   3                      15.0 (max from 2000-01-01 07:59:57 to 2000-01-01 08:00:06)
2000-01-01 08:00:07 2000-01-01 08:00:09  14                      15.0 (max from 2000-01-01 08:00:02 to 2000-01-01 08:00:09)
2000-01-01 08:00:10  17                      17.0 (max from 2000-01-01 08:00:02 to 2000-01-01 08:00:10)
2000-01-01 08:00:11  15                      17.0 (max from 2000-01-01 08:00:02 to 2000-01-01 08:00:11)
2000-01-01 08:00:12 2000-01-01 08:00:12  20                      20.0 (max from 2000-01-01 08:00:07 to 2000-01-01 08:00:12)
2000-01-01 08:00:15  13                      20.0 (etc...)
2000-01-01 08:00:16   2                      20.0
2000-01-01 08:00:17 2000-01-01 08:00:18   5                      20.0
2000-01-01 08:00:19  15                      20.0
2000-01-01 08:00:20  10                      20.0
2000-01-01 08:00:22 2000-01-01 08:00:22  10                      15.0
2000-01-01 08:00:23  13                      15.0
2000-01-01 08:00:24  20                      20.0
2000-01-01 08:00:25   1                      20.0
2000-01-01 08:00:26   3                      20.0
2000-01-01 08:00:27 2000-01-01 08:00:27  10                      20.0
2000-01-01 08:00:28  11                      20.0

一种方法是获取df2中第一级索引上每组的cummax,以获得组中的当前最大值,assign第一级索引的每组max一次shift以获得前一组的最大值,然后使用具有 df2 第一级索引的loc重新索引,以便在右行获得正确的值, 以及列上的max

df2gr = df2.groupby(level=0)
df2['last_2x'] = df2gr.cummax()
.assign(l=df2gr.max().shift()
.loc[df2.index.get_level_values(0)]
.to_numpy())
.max(axis=1, skipna=False)
print (df2)
A  last_2x
2000-01-01 07:59:57 2000-01-01 08:00:00   4.0      NaN
2000-01-01 08:00:01   4.0      NaN
2000-01-01 08:00:02 2000-01-01 08:00:03   8.0      8.0
2000-01-01 08:00:04  15.0     15.0
2000-01-01 08:00:05  15.0     15.0
2000-01-01 08:00:06  15.0     15.0
2000-01-01 08:00:07 2000-01-01 08:00:09  14.0     15.0
2000-01-01 08:00:10  17.0     17.0
2000-01-01 08:00:11  17.0     17.0
2000-01-01 08:00:12 2000-01-01 08:00:12  20.0     20.0
2000-01-01 08:00:15  20.0     20.0
2000-01-01 08:00:16  20.0     20.0
2000-01-01 08:00:17 2000-01-01 08:00:18   5.0     20.0
2000-01-01 08:00:19  15.0     20.0
2000-01-01 08:00:20  15.0     20.0
2000-01-01 08:00:22 2000-01-01 08:00:22  10.0     15.0
2000-01-01 08:00:23  13.0     15.0
2000-01-01 08:00:24  20.0     20.0
2000-01-01 08:00:25  20.0     20.0
2000-01-01 08:00:26  20.0     20.0
2000-01-01 08:00:27 2000-01-01 08:00:27  10.0     20.0
2000-01-01 08:00:28  11.0     20.0

经过一些重新搜索和 @Ben.T 的帮助,这就是我想要达到的状态。可选择更高的时间帧频率和周期:

HIGHER_TF_FREQ = '5s'
HIGHER_TF_PERIOD = 4
df2gr = df.groupby(pd.Grouper(freq=HIGHER_TF_FREQ, base=2))
df2 = df2gr.rolling(HIGHER_TF_FREQ, min_periods=1).max().dropna()
df2['last_2x'] = df2.assign(l=df2gr.max().dropna().rolling(HIGHER_TF_PERIOD-1).max().shift().loc[df2.index.get_level_values(0)].to_numpy()).max(axis=1, skipna=False)
df2

最新更新