将数组切成多个段



>假设我有一个数组[1,2,3,4,5,6,7,8],数组由两个样本组成,[1,2,3,4][5,6,7,8]。对于每个示例,我想做一个窗口大小为n的切片窗口。如果没有足够的元素,请用最后一个元素填充结果。返回值中的每一行应该是从该行中的元素开始的切片窗口。

例如: 如果n=3,则结果应为:

[[1,2,3],
[2,3,4],
[3,4,4],
[4,4,4],
[5,6,7],
[6,7,8],
[7,8,8],
[8,8,8]]

如何通过高效切片而不是 for 循环来实现这一点?谢谢。

使用一些numpy内置功能的类似@hpaulj方法

import numpy as np

samples = [[1,2,3,4],[5,6,7,8]]
ws = 3 #window size
# add padding
samples = [s + [s[-1]]*(ws-1) for s in samples]
# rolling window function for arrays
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1]-window+1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

result = sum([rolling_window(np.array(s), ws).tolist() for s in samples ], [])
result
[[1, 2, 3],
[2, 3, 4],
[3, 4, 4],
[4, 4, 4],
[5, 6, 7],
[6, 7, 8],
[7, 8, 8],
[8, 8, 8]]

python 列表方法:

In [201]: order = [1,3,2,3,5,8]                                                                  
In [202]: samples = [[1,2,3,4],[5,6,7,8]]

展开示例以解决填充问题:

In [203]: samples = [row+([row[-1]]*n) for row in samples]                                       
In [204]: samples                                                                                
Out[204]: [[1, 2, 3, 4, 4, 4, 4], [5, 6, 7, 8, 8, 8, 8]]

定义一个函数:

def foo(i, samples):
for row in samples:
try:
j = row.index(i)
except ValueError:
continue 
return row[j:j+n]
In [207]: foo(3,samples)                                                                         
Out[207]: [3, 4, 4]
In [208]: foo(9,samples)  # non-found case isn't handled well

对于所有订单元素:

In [209]: [foo(i,samples) for i in order]                                                        
Out[209]: [[1, 2, 3], [3, 4, 4], [2, 3, 4], [3, 4, 4], [5, 6, 7], [8, 8, 8]]

我有一个简单的单行:

import numpy as np 
samples = np.array([[1,2,3,4],[5,6,7,8]]) 
n,d = samples.shape 
ws = 3
result = samples[:,np.minimum(np.arange(d)[:,None]+np.arange(ws)[None,:],d-1)]

输出为:

没有循环,只有广播。这使得它可能是最有效的方法。输出的尺寸并不完全是你要求的,但很容易用一个简单的np.reshape来纠正

[[[1 2 3]
[2 3 4]
[3 4 4]
[4 4 4]]
[[5 6 7]
[6 7 8]
[7 8 8]
[8 8 8]]]

最新更新