是否有一种方法可以提高numpy或python的性能?目标是建立一个训练集。features
为原始数据。我想使用一个跨距为1的移动窗口方法来"丰富"数据。最后,我想将数据从2D数组重塑为3D数组,因为一个训练输入的形状为(windowSize, features.shape[1])
。
import numpy as np
windowSize = 4
features = np.array([[1,2],[3,4],[5,6],[7,8],[9,10],[11,12],[13,14],[15,16],[17,18],[19,20]])
featuresReshaped = features[:windowSize]
for i in range(1, features.shape[0], 1):
featuresReshaped = np.vstack((featuresReshaped, features[i:i+windowSize]))
maxindex = int(featuresReshaped.shape[0]/windowSize) * windowSize
featuresReshaped = featuresReshaped[:maxindex]
featuresReshaped = featuresReshaped.reshape(int(featuresReshaped.shape[0]/windowSize), windowSize, featuresReshaped.shape[1])
这个解决方案通过使用NumPy索引来避免所有的循环和诸如此类的东西。
import numpy as np
windowSize = 4
features = np.array(
[[ 1, 2],
[ 3, 4],
[ 5, 6],
[ 7, 8],
[ 9, 10],
[11, 12],
[13, 14],
[15, 16],
[17, 18],
[19, 20]]
)
indices = np.add.outer(np.arange(len(features) - windowSize + 1), np.arange(windowSize))
# indices:
# [[0 1 2 3]
# [1 2 3 4]
# [2 3 4 5]
# [3 4 5 6]
# [4 5 6 7]
# [5 6 7 8]
# [6 7 8 9]]
features[indices] # indices must be of type np.ndarray or this won't work
# features[indices]:
# [[[ 1 2]
# [ 3 4]
# [ 5 6]
# [ 7 8]]
# [[ 3 4]
# [ 5 6]
# [ 7 8]
# [ 9 10]]
# [[ 5 6]
# [ 7 8]
# [ 9 10]
# [11 12]]
# [[ 7 8]
# [ 9 10]
# [11 12]
# [13 14]]
# [[ 9 10]
# [11 12]
# [13 14]
# [15 16]]
# [[11 12]
# [13 14]
# [15 16]
# [17 18]]
# [[13 14]
# [15 16]
# [17 18]
# [19 20]]]
应该注意的是,您的代码输出与我的不同,我认为这可能是一个错误,因为您的最后一片是:
print(featuresReshaped[-1])
# [[15 16]
# [17 18]
# [19 20]
# [17 18]]]
与"移动窗口"不一致。您提供的描述。