Python numpy方法/属性比numpy函数更快?



我最近注意到一些numpy数组属性/方法似乎比相应的numpy函数快得多。np.conj(x)vs.x.conjugate()示例:

import numpy as np
import time
np.random.seed(100)
t0_1 = 0
t0_2 = 0
for i in range(1000):
a = np.random.rand(10000)
t0 = time.time()
b = np.conjugate(a)
t0_1 += time.time() - t0; t0 = time.time()
c = a.conjugate()
t0_2 += time.time() - t0; t0 = time.time()
print(t0_1, t0_2)
# example output times: 0.01222848892211914 0.0008714199066162109

即使没有适当的基准测试,看起来也有超过10倍的性能增益。同样,x.real, x.imag, x.max()和其他基本方法似乎也比相应的函数np.real(x), np.imag(x), np.max(x)等要快。

谁能给我解释一下节省时间的原因?它是否与原地操作与新数组创建有关?是否有特定的检查,numpy函数做的是跳过数组方法?提前感谢!

更新:下面是几个常见numpy函数/方法的计算时间的简单比较,用于浮点数、复数和布尔数组。方法在函数(float/complex/bool)上的最大速度增益因子似乎是a.real(12/15/12),a.imag(70/15/26)和a.conj(80/15/33),正如@hpaulj的帖子所解释的那样(imagconj对真正的数组没用),以及a.sort(5/5/1.5)(我的猜测是这是由于原位操作),a.max/a.min(1.6 bool)(再次,maxmin对bool数组没用)。其他速度增益通常在1.1到1.4之间。对于a.argsorta.stda.__len__,因子通常在1左右,对于a.__abs__,因子甚至低于1。

所以看起来除了a.real,a.imaga.sort,速度增益通常不是很大,比如1.2。然而,这可能取决于数组大小,数组是否(部分)排序等。

import numpy as np
from IPython import get_ipython
ipython = get_ipython()
np.random.seed(1000)
asize = 10000
dtype_list = ['float', 'complex', 'bool']
for i in range(3):
print(dtype_list[i])
print('-----------------')
if i == 0:
a = np.random.rand(asize)
elif i == 1:
a = np.random.rand(asize) + 1j*np.random.rand(asize)
elif i == 2:
a = np.random.randint(2,size=asize).astype(bool)

function_list = [np.real, np.imag, np.conj, np.sum, np.cumsum, np.prod, np.cumprod,
np.max, np.min, np.argmax, np.argmin, np.mean, np.var, np.std,
np.sort, np.argsort, np.all, np.any, np.abs, len]
methatt_list = [a.real, a.imag, a.conj, a.sum, a.cumsum, a.prod, a.cumprod,
a.max, a.min, a.argmax, a.argmin, a.mean, a.var, a.std,
a.sort, a.argsort, a.all, a.any, a.__abs__, a.__len__]
for j in range(len(function_list)):
print(function_list[j].__name__)
ipython.magic('timeit function_list[j](a)')
if callable(methatt_list[j]):
ipython.magic('timeit methatt_list[j]()')
else:
ipython.magic('timeit methatt_list[j]')
print('')
# float
# -----------------
# real
# 740 ns ± 13.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# 60.7 ns ± 0.226 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# imag
# 4.45 µs ± 36.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 60.9 ns ± 0.353 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# conjugate
# 9.64 µs ± 40.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 124 ns ± 0.238 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# sum
# 15.8 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 11.8 µs ± 82.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# cumsum
# 42.4 µs ± 254 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 37.7 µs ± 38.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# prod
# 32.7 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 29 µs ± 57.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# cumprod
# 51.5 µs ± 102 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 47.1 µs ± 154 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# amax
# 14.5 µs ± 51.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 10.7 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# amin
# 14.6 µs ± 90.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 10.7 µs ± 45.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# argmax
# 11.1 µs ± 15.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 8.62 µs ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# argmin
# 11.5 µs ± 31.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 8.76 µs ± 37 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# mean
# 23.5 µs ± 440 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 19.6 µs ± 569 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# var
# 78.6 µs ± 381 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 73.3 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# std
# 86.7 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 81.9 µs ± 663 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# sort
# 659 µs ± 1.85 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 141 µs ± 682 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# argsort
# 156 µs ± 508 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 151 µs ± 704 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# all
# 23.4 µs ± 41.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 17.7 µs ± 17.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# any
# 23.4 µs ± 72.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 17.3 µs ± 67 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# absolute
# 7.1 µs ± 12.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 7.25 µs ± 20.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# len
# 125 ns ± 0.17 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# 117 ns ± 0.463 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# complex
# -----------------
# real
# 920 ns ± 1.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# 61.1 ns ± 0.0517 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# imag
# 898 ns ± 0.792 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# 61.3 ns ± 0.178 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# conjugate
# 18.1 µs ± 45.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 18.6 µs ± 7.75 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# sum
# 24 µs ± 40 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 18.7 µs ± 97 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# cumsum
# 44.8 µs ± 80.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 39.4 µs ± 135 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# prod
# 99.6 µs ± 195 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 95.4 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# cumprod
# 94.9 µs ± 245 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 89.7 µs ± 284 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# amax
# 41.3 µs ± 141 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 37 µs ± 110 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# amin
# 41.7 µs ± 65.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 37.1 µs ± 145 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# argmax
# 27.4 µs ± 47.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 24.5 µs ± 77.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# argmin
# 28.8 µs ± 28.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 25.5 µs ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# mean
# 32.2 µs ± 43.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 27.6 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# var
# 139 µs ± 844 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 135 µs ± 476 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# std
# 147 µs ± 195 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 145 µs ± 2.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# sort
# 774 µs ± 3.47 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 201 µs ± 145 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# argsort
# 277 µs ± 2.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 271 µs ± 123 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# all
# 37.9 µs ± 136 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 31 µs ± 252 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# any
# 37.5 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 30.2 µs ± 11.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# absolute
# 217 µs ± 2.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 216 µs ± 272 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# len
# 121 ns ± 0.38 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# 117 ns ± 1.23 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# bool
# -----------------
# real
# 726 ns ± 4.61 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# 60.5 ns ± 0.0926 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# imag
# 1.55 µs ± 2.44 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# 60.7 ns ± 0.123 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# conjugate
# 4.16 µs ± 18.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 125 ns ± 0.339 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# sum
# 24.2 µs ± 82.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 19.3 µs ± 82.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# cumsum
# 48.2 µs ± 428 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 41.2 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# prod
# 29.2 µs ± 73.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 25.3 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# cumprod
# 53.7 µs ± 83.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 46.6 µs ± 136 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# amax
# 9.37 µs ± 93 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 5.81 µs ± 21.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# amin
# 9.16 µs ± 15.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 5.75 µs ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# argmax
# 2.93 µs ± 8.85 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 589 ns ± 5.33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# argmin
# 3.07 µs ± 14.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 622 ns ± 4.37 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# mean
# 33.5 µs ± 27.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 29.1 µs ± 286 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# var
# 111 µs ± 749 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 105 µs ± 735 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# std
# 117 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 113 µs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# sort
# 157 µs ± 407 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 105 µs ± 433 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# argsort
# 115 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 112 µs ± 925 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# all
# 8.26 µs ± 9.85 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 3.86 µs ± 11.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# any
# 8.49 µs ± 23 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 4 µs ± 30.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# absolute
# 1.52 µs ± 3.14 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# 1.72 µs ± 2.95 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# len
# 122 ns ± 0.24 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# 117 ns ± 0.279 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

numpy函数通常将操作委托给存在的方法。但它们还必须检查参数是否为数组,等等。ufuncs也有一些额外的"行李"来处理参数,如out,where。因此,时差(不一定)与数组大小成比例。

In [400]: a = np.random.rand(10000)

比较conjugate:

In [404]: timeit np.conjugate(a)
10 µs ± 15.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [405]: timeit a.conjugate()
94.2 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

ns时间表明该方法正在采取某种捷径。(我将在后面探讨)

max时差不是很明显,我可以将其归因于函数开销:

In [406]: timeit np.max(a)
13.2 µs ± 16.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [407]: timeit a.max()
9.46 µs ± 79.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

但是让我们用一个复杂的数组来测试,其中conjugate不是简单的

In [408]: ac = a+1j*a

现在方法和函数时间相同:

In [409]: timeit np.conjugate(ac)
18.2 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [410]: timeit ac.conjugate()
18.3 µs ± 10.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

real属性仍然快得多。看看np.real的python代码,我认为时差只是由于函数包装。

In [411]: timeit np.real(ac)
743 ns ± 21.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [413]: timeit ac.real
129 ns ± 4.93 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

float数组的conjugate方法只是返回一个视图(或者可能是数组本身)。这说明了它的速度:

In [418]: a.__array_interface__['data']
Out[418]: (84672384, False)
In [419]: a.conjugate().__array_interface__['data']
Out[419]: (84672384, False)
In [420]: ac.__array_interface__['data']
Out[420]: (84992432, False)
In [421]: ac.conjugate().__array_interface__['data']
Out[421]: (85165216, False)

是数组本身:

In [422]: id(a)
Out[422]: 140673862490512
In [423]: id(a.conjugate())
Out[423]: 140673862490512

np.realcode:

def real(val):
try:
return val.real
except AttributeError:
return asanyarray(val).real

最新更新