问题很简单:这是我目前的算法。由于数组上的循环,这非常慢。是否有一种方法来改变它,以避免循环和利用NumPy数组类型?
import numpy as np
def loopingFunction(listOfVector1, listOfVector2):
resultArray = []
for vector1 in listOfVector1:
result = 0
for vector2 in listOfVector2:
result += np.dot(vector1, vector2) * vector2[2]
resultArray.append(result)
return np.array(resultArray)
listOfVector1x = np.linspace(0,0.33,1000)
listOfVector1y = np.linspace(0.33,0.66,1000)
listOfVector1z = np.linspace(0.66,1,1000)
listOfVector1 = np.column_stack((listOfVector1x, listOfVector1y, listOfVector1z))
listOfVector2x = np.linspace(0.33,0.66,1000)
listOfVector2y = np.linspace(0.66,1,1000)
listOfVector2z = np.linspace(0, 0.33, 1000)
listOfVector2 = np.column_stack((listOfVector2x, listOfVector2y, listOfVector2z))
result = loopingFunction(listOfVector1, listOfVector2)
我应该处理非常大的数组,每个数组中有超过1000个向量。所以,如果你有什么建议,我会接受的。
必填项np.einsum
基准测试
r2 = np.einsum('ij, kj, k->i', listOfVector1, listOfVector2, listOfVector2[:,2], optimize=['einsum_path', (1, 2), (0, 1)])
#%timeit result: 10000 loops, best of 5: 116 µs per loop
np.testing.assert_allclose(result, r2)
只是为了好玩,我编写了一个优化的Numba实现,优于所有其他。它是基于@MichaelSzczesny答案的einsum
优化。
import numpy as np
import numba as nb
# This decorator ask Numba to eagerly compile the code using
# the provided signature string (containing the parameter types).
@nb.njit('(float64[:,::1], float64[:,::1])')
def loopingFunction_numba(listOfVector1, listOfVector2):
n, m = listOfVector1.shape
assert m == 3
result = np.empty(n)
s1 = s2 = s3 = 0.0
for i in range(n):
factor = listOfVector2[i, 2]
s1 += listOfVector2[i, 0] * factor
s2 += listOfVector2[i, 1] * factor
s3 += listOfVector2[i, 2] * factor
for i in range(n):
result[i] = listOfVector1[i, 0] * s1 + listOfVector1[i, 1] * s2 + listOfVector1[i, 2] * s3
return result
result = loopingFunction_numba(listOfVector1, listOfVector2)
以下是我的i5-9600KF处理器的时序:
Initial: 1052.0 ms
ymmx: 5.121 ms
MichaelSzczesny: 75.40 us
MechanicPig: 3.36 us
Numba: 2.74 us
Optimal lower bound: 0.66 us
这个解决方案~384_000倍快比原来的那个好。请注意,它甚至没有使用处理器的SIMD指令,这将导致我的机器上的速度提高4倍。这只能通过拥有比当前输入更适合simd的转置输入来实现。换位也可能加速其他答案,如MechanicPig,因为BLAS经常从中受益。结果代码将达到象征性的1_000_000加速因子!
您至少可以删除两个forloop以节省大量时间,直接使用矩阵计算
import time
import numpy as np
def loopingFunction(listOfVector1, listOfVector2):
resultArray = []
for vector1 in listOfVector1:
result = 0
for vector2 in listOfVector2:
result += np.dot(vector1, vector2) * vector2[2]
resultArray.append(result)
return np.array(resultArray)
def loopingFunction2(listOfVector1, listOfVector2):
resultArray = np.sum(np.dot(listOfVector1, listOfVector2.T) * listOfVector2[:,2], axis=1)
return resultArray
listOfVector1x = np.linspace(0,0.33,1000)
listOfVector1y = np.linspace(0.33,0.66,1000)
listOfVector1z = np.linspace(0.66,1,1000)
listOfVector1 = np.column_stack((listOfVector1x, listOfVector1y, listOfVector1z))
listOfVector2x = np.linspace(0.33,0.66,1000)
listOfVector2y = np.linspace(0.66,1,1000)
listOfVector2z = np.linspace(0, 0.33, 1000)
listOfVector2 = np.column_stack((listOfVector2x, listOfVector2y, listOfVector2z))
import time
t0 = time.time()
result = loopingFunction(listOfVector1, listOfVector2)
print('time old version',time.time() - t0)
t0 = time.time()
result2 = loopingFunction2(listOfVector1, listOfVector2)
print('time matrix computation version',time.time() - t0)
print('Are results are the same',np.allclose(result,result2))
,
time old version 1.174513578414917
time matrix computation version 0.011968612670898438
Are results are the same True
基本上,循环越少越好。
避免嵌套循环,调整计算顺序,比优化后的np.einsum
快20倍,比原程序快近400_000倍:
>>> out = listOfVector1.dot(listOfVector2[:, 2].dot(listOfVector2))
>>> np.allclose(out, loopingFunction(listOfVector1, listOfVector2))
True
测试:
>>> timeit(lambda: loopingFunction(listOfVector1, listOfVector2), number=1)
1.4389081999834161
>>> timeit(lambda: listOfVector1.dot(listOfVector2[:, 2].dot(listOfVector2)), number=400_000)
1.3162514999858104
>>> timeit(lambda: np.einsum('ij, kj, k->i', listOfVector1, listOfVector2, listOfVector2[:, 2], optimize=['einsum_path', (1, 2), (0, 1)]), number=18_000)
1.3501517999975476