如何有效地报告以秒为单位的固定长度的滑动窗口



>我有一个以秒为单位的时间列表,例如:

L = [ 0.10218048,  1.20851996,  1.46800021,  1.73429061,  2.71525848,
3.14781922,  3.63637958,  5.11147358,  5.97497864,  6.35469013,
6.80623747,  6.99571917,  7.65215123,  7.86108352,  8.52988247,
8.83068894, 10.07690977, 11.53867284, 12.01214112, 12.13307653]

对于从第二个边界开始的每个长度为 2 秒的窗口,我想输出一个落在 2 秒窗口内的所有时间的列表。因此,对于上面的例子,它将是:

[0.10218048,  1.20851996,  1.46800021,  1.73429061]
[1.20851996,  1.46800021,  1.73429061, 2.71525848]
[2.71525848, 3.14781922,  3.63637958]
[3.14781922,  3.63637958]
[5.11147358,  5.97497864]
[5.11147358,  5.97497864, 6.35469013, 6.80623747,  6.99571917]
[6.35469013, 6.80623747,  6.99571917, 7.65215123,  7.86108352]
[7.65215123,  7.86108352, 8.52988247, 8.83068894]
[8.52988247, 8.83068894]
[10.07690977]
[10.07690977, 11.5386728]
[11.5386728, 12.01214112, 12.13307653]
[12.01214112, 12.13307653]

通常,窗口长度可能与 2 不同。

你怎么能做到这一点?

我可以提出的一个可能的解决方案在某种意义上是"有效的",它只遍历输入数据一次,没有依赖关系。当然,代价是它是用纯python编写的(可能会有更多优化的代码(,并且它引入了更多的跟踪变量来防止重复(因此不那么python化(。

def sliding_window(data, duration, start=0, overlap=1):
result = []
data_idx = 0
result_idx = 0
upper = start + duration
lower = start
next_lower = upper - overlap
# inner helper to pad empty inner-lists up to our insert point and insert
def pad_and_append(at):
while len(result) <= at:
result.append([])
result[at].append(data[data_idx])
# iterate through input data
while data_idx < len(data):
# is the datum within the current interval?
if lower <= data[data_idx] < upper:
pad_and_append(result_idx)
# is it within the overlap to the next interval?
if next_lower <= data[data_idx]:
pad_and_append(result_idx + 1)
# next datum
data_idx = data_idx + 1
else:
# we captured all items within the interval and
# the overlap to the next. let's set up the next interval
result_idx = result_idx + 1
lower = next_lower
upper = lower + duration
next_lower = upper - overlap
return result

这是一个使用简单循环的解决方案。

import math
from collections import defaultdict
L = [ 0.10218048,  1.20851996,  1.46800021,  1.73429061,  2.71525848,
3.14781922,  3.63637958,  5.11147358,  5.97497864,  6.35469013,
6.80623747,  6.99571917,  7.65215123,  7.86108352,  8.52988247,
8.83068894, 10.07690977, 11.53867284, 12.01214112, 12.13307653]
binned = defaultdict(list)
n = 2 #window size
for a in range(math.ceil(max(L))):
b = a+n
k = f'{a}:{b}'
for x in L: #assuming L is sorted
if x > a:
if x < b:
binned[k].append(x)
else: break
binned

defaultdict(list,
{'0:2': [0.10218048, 1.20851996, 1.46800021, 1.73429061],
'1:3': [1.20851996, 1.46800021, 1.73429061, 2.71525848],
'2:4': [2.71525848, 3.14781922, 3.63637958],
'3:5': [3.14781922, 3.63637958],
'4:6': [5.11147358, 5.97497864],
'5:7': [5.11147358, 5.97497864, 6.35469013, 6.80623747, 6.99571917],
'6:8': [6.35469013, 6.80623747, 6.99571917, 7.65215123, 7.86108352],
'7:9': [7.65215123, 7.86108352, 8.52988247, 8.83068894],
'8:10': [8.52988247, 8.83068894],
'9:11': [10.07690977],
'10:12': [10.07690977, 11.53867284],
'11:13': [11.53867284, 12.01214112, 12.13307653],
'12:14': [12.01214112, 12.13307653]})

我希望我答对了,你基本上想根据 2 秒的时间窗口L数据切片,重叠 1 秒?那么这可能是一个选项:

import numpy as np
L = [0.10218048,  1.20851996,  1.46800021,  1.73429061,  2.71525848,
3.14781922,  3.63637958,  5.11147358,  5.97497864,  6.35469013,
6.80623747,  6.99571917,  7.65215123,  7.86108352,  8.52988247,
8.83068894, 10.07690977, 11.53867284, 12.01214112, 12.13307653]
L = np.array(L)
lim = []
for i in range(0, int(np.ceil(L[-1])), 1): 
# change 1st range param for other t0
# change 3rd range param for other t step
lim += [[i,i+2]] # change the '+2' to your desired dt
for l in lim:
print(L[(L>=l[0]) & (L<l[1])])
# in case you don't need the limits array, just simplify to
# for i in range(0, int(np.ceil(L[-1])), 1):
#    print(L[(L>=i) & (L<i+2)])

。指纹

[0.10218048,  1.20851996,  1.46800021,  1.73429061]
[1.20851996,  1.46800021,  1.73429061, 2.71525848]
[2.71525848, 3.14781922,  3.63637958]
[3.14781922,  3.63637958]
[5.11147358,  5.97497864]
[5.11147358,  5.97497864, 6.35469013, 6.80623747,  6.99571917]
[6.35469013, 6.80623747,  6.99571917, 7.65215123,  7.86108352]
[7.65215123,  7.86108352, 8.52988247, 8.83068894]
[8.52988247, 8.83068894]
[10.07690977]
[10.07690977, 11.5386728]
[11.5386728, 12.01214112, 12.13307653]
[12.01214112, 12.13307653]

注意:我不确定这是否真的有效,因为在循环中,会检查完整的数组L。但我想numpy条件切片还不错。在这里看到一些timeit比较会很有趣。

我认为您的意思是基于"对于从第二个边界开始的每个长度为 2 秒的窗口"而不是重叠的增量。对于两秒间隔,它是相同的,但由于您要改变长度,一秒重叠将是 0-3、2-5、4-7,但增量意味着 0-3、1-4、2-5。但是,为了以防万一,找出两者的解决方案很有趣。

假设 L 被排序并且所有元素都是正数,并且第二个区间都以整数开头,我们可以使用此方法:

import math
from collections import defaultdict
L = [ 0.10218048,  1.20851996,  1.46800021,  1.73429061,  2.71525848,
3.14781922,  3.63637958,  5.11147358,  5.97497864,  6.35469013,
6.80623747,  6.99571917,  7.65215123,  7.86108352,  8.52988247,
8.83068894, 10.07690977, 11.53867284, 12.01214112, 12.13307653]
my_ranges = defaultdict(list)
interval_width = 2
for x in L:
upper_bound = math.ceil(x)
lower_bound = upper_bound - interval_width
lower_bound = max(0, lower_bound)
for y in range(lower_bound, upper_bound):
my_ranges[y].append(x)
for a in sorted(my_ranges):
print(my_ranges[a])

我不知道你是否想看看是否有任何空白范围。但是,如果您愿意,默认命令也会打印出空白范围。使用此行而不是"for in sorted":

for a in range(min(my_ranges), max(my_ranges) + 1):

如果你想要范围 0-3、2-5、4-7,这有效:

import math
from collections import defaultdict
L = [ 0.10218048,  1.20851996,  1.46800021,  1.73429061,  2.71525848,
3.14781922,  3.63637958,  5.11147358,  5.97497864,  6.35469013,
6.80623747,  6.99571917,  7.65215123,  7.86108352,  8.52988247,
8.83068894, 10.07690977, 11.53867284, 12.01214112, 12.13307653]
interval_width = 2
my_ranges_2 = defaultdict(list)
for x in L:
definitely_in = (x // (interval_width - 1)) * (interval_width - 1) # lowest multiple of interval_width below x will always be in
#print("Adding", x, "to", definitely_in)
my_ranges_2[definitely_in].append(x)
if x < definitely_in + 1 and definitely_in - interval_width >= 0: # for instance, if x is 2.3 and we have 0,3 2,5 etc. ... we need to catch this duplicate case. I am assuming the window lengths are integers, and if not, we have a lot more to do, because the number may go in more than one array. Perhaps we could have a while loop, incrementing by (interval_width - 1)
#print("++Adding", x, "to", definitely_in - interval_width + 1)
my_ranges_2[definitely_in - interval_width + 1].append(x)
for a in sorted(my_ranges_2):
print(a, my_ranges_2[a])
#    print(my_ranges_2[a])

我怀疑我忘记了一些细节,但希望您可以根据需要调整interval_width以确保我的代码正在做您希望的事情,并让我知道您到底需要什么。

最新更新