我有一个数字数组a
。
rnd = np.random.default_rng(12345)
a = rnd.uniform(0, -50, 5)
# array([-11.36680112, -15.83791699, -39.86827287, -33.81273354,
# -19.55547753])
我想找出数组与同一数组中每个元素的差异。示例输出将是:
[array([ 0. , 4.47111586, 28.50147174, 22.44593241, 8.18867641]),
array([-4.47111586, 0. , 24.03035588, 17.97481655, 3.71756054]),
array([-28.50147174, -24.03035588, 0. , -6.05553933,
-20.31279534]),
array([-22.44593241, -17.97481655, 6.05553933, 0. ,
-14.25725601]),
array([-8.18867641, -3.71756054, 20.31279534, 14.25725601, 0. ])]
我的第一种方法是使用列表推导式[i - a for i in a]
。但是,由于我的原始数组a
非常大,并且我有数千个这样的a
需要执行相同的操作,因此整个过程变得非常缓慢并且内存消耗非常大,以至于jupyter内核死亡。
有没有可能的方法可以加快这个速度?
最简单的方法是使用广播:
import numpy as np
rnd = np.random.default_rng(12345)
a = rnd.uniform(0, -50, 5)
a[:, None] - a
输出:
array([[ 0. , 4.47111586, 28.50147174, 22.44593241,
8.18867641],
[ -4.47111586, 0. , 24.03035588, 17.97481655,
3.71756054],
[-28.50147174, -24.03035588, 0. , -6.05553933,
-20.31279534],
[-22.44593241, -17.97481655, 6.05553933, 0. ,
-14.25725601],
[ -8.18867641, -3.71756054, 20.31279534, 14.25725601,
0. ]])
有两种方法可以做到,一种是只使用numpy向量
- 内存效率低(在这种情况下numpy更快)。但如果数组大小较小
a[:, None] - a
- 使用numba + numpy,它有llvm优化,所以它可以在速度上做魔法,你也可以用parallel = True选项来平方你的速度。对于超大数组,这应该是go到。或c++
对于40000大小,在没有并行性的情况下,这在3秒内完成,在我的12核机器上,在并行性
import numpy as np
import numba as nb
rnd = np.random.default_rng(12345)
a = rnd.uniform(0, -50, 5)
# return type nb.float64[:, :]
# input argument type nb.float64[:, :]
# By specifying these you can do eager compilation instead of lazy
# also you can add parallel = True, cache=True
# if you are using python threading then nogil=True
# you can do lots of stuff
# numba has SIMD vectorization, which just means it shall not loose to numpy on performance grounds if coded properly
@nb.njit(nb.float64[:, :](nb.float64[:]))
def speed(a):
# empty to prevent unnecessary initializations
b = np.empty((a.shape[0], a.shape[0]), dtype=a.dtype)
# nb.prange needed to tell numba this for loop can be parallelized
for i in nb.prange(a.shape[0]):
for j in range(a.shape[0]):
b[i][j] = a[i] - a[j]
return b
speed(a)
import numpy as np
import numba as nb
import sys
import time
@nb.njit(nb.float64[:, :](nb.float64[:]))
def f1(a):
b = np.empty((a.shape[0], a.shape[0]), dtype=a.dtype)
for i in nb.prange(a.shape[0]):
for j in range(a.shape[0]):
b[i][j] = a[i] - a[j]
return b
@nb.njit(nb.float64[:, :](nb.float64[:]), parallel=True, cache=True)
def f2(a):
b = np.empty((a.shape[0], a.shape[0]), dtype=a.dtype)
for i in nb.prange(a.shape[0]):
for j in range(a.shape[0]):
b[i][j] = a[i] - a[j]
return b
def f3(a):
return a[:, None] - a
if __name__ == '__main__':
s0 = time.time()
rnd = np.random.default_rng(12345)
a = rnd.uniform(0, -50, int(sys.argv[2]))
b = eval(sys.argv[1] + '(a)')
print(time.time() - s0)
(base) xxx:~$ python test.py f1 40000
3.0324509143829346
(base) xxx:~$ python test.py f2 40000
0.6196465492248535
(base) xxx:~$ python test.py f3 40000
2.4126882553100586
我面临着类似的限制,我需要一些快速的东西。通过解决内存使用和numba,我在没有并行的情况下获得了大约50倍的速度为什么是np。假设和np。减去。外的速度很快吗?