是否有可能用numpy代替for循环来提高性能?



是否有一种方法可以提高numpy或python的性能?目标是建立一个训练集。features为原始数据。我想使用一个跨距为1的移动窗口方法来"丰富"数据。最后,我想将数据从2D数组重塑为3D数组,因为一个训练输入的形状为(windowSize, features.shape[1])

import numpy as np
windowSize = 4
features = np.array([[1,2],[3,4],[5,6],[7,8],[9,10],[11,12],[13,14],[15,16],[17,18],[19,20]])
featuresReshaped = features[:windowSize]
for i in range(1, features.shape[0], 1):
featuresReshaped = np.vstack((featuresReshaped, features[i:i+windowSize]))
maxindex = int(featuresReshaped.shape[0]/windowSize) * windowSize
featuresReshaped = featuresReshaped[:maxindex]
featuresReshaped = featuresReshaped.reshape(int(featuresReshaped.shape[0]/windowSize), windowSize, featuresReshaped.shape[1])

这个解决方案通过使用NumPy索引来避免所有的循环和诸如此类的东西。

import numpy as np
windowSize = 4
features = np.array(
[[ 1,  2], 
[ 3,  4],
[ 5,  6],
[ 7,  8],
[ 9, 10],
[11, 12],
[13, 14],
[15, 16],
[17, 18],
[19, 20]]
)
indices = np.add.outer(np.arange(len(features) - windowSize + 1), np.arange(windowSize))
# indices:
# [[0 1 2 3]
#  [1 2 3 4]
#  [2 3 4 5]
#  [3 4 5 6]
#  [4 5 6 7]
#  [5 6 7 8]
#  [6 7 8 9]]
features[indices] # indices must be of type np.ndarray or this won't work
# features[indices]:
# [[[ 1  2]
#   [ 3  4]
#   [ 5  6]
#   [ 7  8]]
#  [[ 3  4]
#   [ 5  6]
#   [ 7  8]
#   [ 9 10]]
#  [[ 5  6]
#   [ 7  8]
#   [ 9 10]
#   [11 12]]
#  [[ 7  8]
#   [ 9 10]
#   [11 12]
#   [13 14]]
#  [[ 9 10]
#   [11 12]
#   [13 14]
#   [15 16]]
#  [[11 12]
#   [13 14]
#   [15 16]
#   [17 18]]
#  [[13 14]
#   [15 16]
#   [17 18]
#   [19 20]]]
应该注意的是,您的代码输出与我的不同,我认为这可能是一个错误,因为您的最后一片是:
print(featuresReshaped[-1])
# [[15 16]
#  [17 18]
#  [19 20]
#  [17 18]]]

与"移动窗口"不一致。您提供的描述。

最新更新