假设我有一个为两个浮点值定义的函数,而这个函数相当复杂,不容易修改。现在我有两个相同的2D数组,比如$X_{n\times m},Y_{\n\times m}$,我需要对2D数组$X_{ij},Y_{iz}$上的每个元素执行函数。关于两个for循环,我如何加快这项工作的速度?
以下是通用代码,其中函数被简化为求和:
def func(x, y):
return x + y
X = np.random.rand(100, 100)
Y = np.random.rand(100, 100)
Z = np.zeros((100, 100))
for i in range(100):
for j in range(100):
z = func(X[i, j], Y[i, j])
Z[i, j] = z
方法
-
- 使用numpy矢量化
-
- 使用Numpy函数
方法1——Numpy矢量化
提供4倍的加速(3.14毫秒vs.12.1毫秒(
%%timeit
Z = np.zeros((100, 100))
for i in range(100):
for j in range(100):
z = func(X[i, j], Y[i, j])
Z[i, j] = z
12.2 ms ± 901 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
使用矢量化
%%timeit
vec_func = np.vectorize(func) # vectorized version of function
Z2 = vec_func(X, Y) # use vectorized version on X, Y
3.14 ms ± 192 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
方法2——使用Numpy矢量化函数
如果更复杂的函数可以用numpy函数来完成,如加法、减法、sqrt、exp等。
- 为简单函数示例提供约775的加速(15.5 us vs.12.1 ms(
代码
def func(X, Y):
return np.add(X, Y)
%timeit Z3 = func(X, Y)
15.5 µs ± 1.6 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
您可以使用np.vectorize
或使用Python列表和浮点而不是NumPy:
4.82 ms original
1.78 ms vectorized
2.02 ms mapped
1.04 ms mapped2
代码(在线试用!(:
def original(func, X, Y):
Z = np.zeros((100, 100))
for i in range(100):
for j in range(100):
z = func(X[i, j], Y[i, j])
Z[i, j] = z
return Z
def vectorized(func, X, Y):
return np.vectorize(func)(X, Y)
def mapped(func, X, Y):
return np.array([
[*map(func, x, y)]
for x, y in zip(X.tolist(), Y.tolist())
])
def mapped2(func, X, Y):
return [
[*map(func, x, y)]
for x, y in zip(X2, Y2)
]
from timeit import repeat
import numpy as np
fs = original, vectorized, mapped, mapped2
def func(x, y):
return x + y
X = np.random.rand(100, 100)
Y = np.random.rand(100, 100)
X2 = X.tolist()
Y2 = Y.tolist()
expect = fs[0](func, X, Y)
for f in fs:
print((f(func, X, Y) == expect).all())
for _ in range(3):
for f in fs:
t = min(repeat(lambda: f(func, X, Y), number=10)) / 10
print('%.2f ms ' % (t * 1e3), f.__name__)
print()