我正在尝试计算Khatri-Rao之类的总和(即成对行总和(,并能够提出以下解决方案:
In [15]: arr1
Out[15]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
In [16]: arr2
Out[16]:
array([[11, 12, 13],
[12, 13, 14],
[13, 14, 15]])
# for every row in `arr1`, sum it with all rows in `arr2` (in pairwise manner)
In [17]: np.repeat(arr1, arr2.shape[0], 0) + np.tile(arr2, (arr1.shape[0], 1))
Out[17]:
array([[12, 14, 16],
[13, 15, 17],
[14, 16, 18],
[13, 15, 17],
[14, 16, 18],
[15, 17, 19],
[14, 16, 18],
[15, 17, 19],
[16, 18, 20]])
# thus `axis0` in the result will become `arr1.shape[0] * arr2.shape[0]`
In [18]: (np.repeat(arr1, arr2.shape[0], 0) + np.tile(arr2, (arr1.shape[0], 1))).shape
Out[18]: (9, 3)
它工作得很好。但是,我想知道这是否是进行此计算的优化方法。我还为一个(相当(大的数组计时了计算时间
# inputs
In [69]: arr1 = np.arange(9000).reshape(100, 90)
In [70]: arr2 = np.arange(45000).reshape(500, 90)
In [71]: (np.repeat(arr1, arr2.shape[0], 0) + np.tile(arr2, (arr1.shape[0], 1))).shape
Out[71]: (50000, 90)
In [72]: %timeit np.repeat(arr1, arr2.shape[0], 0) + np.tile(arr2, (arr1.shape[0], 1))
22.5 ms ± 420 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
是否有可能进一步优化它,也许使用更复杂的方法?
另外,我不完全确定是否可以在这里利用numpy.einsum()
。因为,据我了解,它不能用于增加结果数组的形状,这就是这里正在发生的事情。我欢迎对我的解决方案进行更正、建议和改进:)
我们可以利用broadcasting
-
(arr1[:,None] + arr2).reshape(-1,arr1.shape[1])
对于大型阵列,我们可以通过numexpr
来传输broadcasting
部分,从而获得进一步的加速 -
import numexpr as ne
arr1_3D = arr1[:,None]
out = ne.evaluate('arr1_3D + arr2').reshape(-1,arr1.shape[1])
运行时测试 -
In [545]: arr1 = np.random.rand(500,500)
In [546]: arr2 = np.random.rand(500,500)
In [547]: %timeit (arr1[:,None] + arr2).reshape(-1,arr1.shape[1])
1 loop, best of 3: 215 ms per loop
In [548]: %%timeit
...: arr1_3D = arr1[:,None]
...: out = ne.evaluate('arr1_3D + arr2').reshape(-1,arr1.shape[1])
10 loops, best of 3: 174 ms per loop