为什么 numpy ma.average比 arr.mean 慢 24 倍?



我在Python的numpy中发现了一些有趣的东西。ma.averagearr.mean慢得多(arr 是一个数组(

>>> arr = np.full((3, 3), -9999, dtype=float)
array([[-9999., -9999., -9999.],
[-9999., -9999., -9999.],
[-9999., -9999., -9999.]])
%timeit np.ma.average(arr, axis=0)
The slowest run took 49.32 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 191 µs per loop
%timeit arr.mean(axis=0)
The slowest run took 6.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.41 µs per loop

使用随机数

arr = np.random.random((3,3))
%timeit arr.mean(axis=0)
The slowest run took 6.17 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.78 µs per loop
%timeit np.ma.average(arr, axis=0)
1000 loops, best of 3: 186 µs per loop

--> 这几乎慢了24倍。

文档

numpy.ma.average(a, axis=None, weights=None, returned=False)

返回数组在给定轴上的weighted平均值。

numpy.mean(a, axis=None, dtype=None, out=None, keepdims)

沿指定轴计算算术平均值。


为什么ma.averagearr.mean慢得多?在数学上它们是相同的(如果我错了,请纠正我(。我的猜测是,它与ma.average上的加权选项有关,但是如果没有通过权重,不应该有后备吗?

找出为什么某事变慢的一个好方法是分析它。我将在这里使用第三方库line_profiler和 IPython 命令%lprun(例如请参阅此博客(:

%load_ext line_profiler
import numpy as np
arr = np.full((3, 3), -9999, dtype=float)
%lprun -f np.ma.average np.ma.average(arr, axis=0)
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
519                                           def average(a, axis=None, weights=None, returned=False):
...
570         1         1810   1810.0     30.5      a = asarray(a)
571         1           15     15.0      0.3      m = getmask(a)
572                                           
573                                               # inspired by 'average' in numpy/lib/function_base.py
574                                           
575         1            5      5.0      0.1      if weights is None:
576         1         3500   3500.0     59.0          avg = a.mean(axis)
577         1          591    591.0     10.0          scl = avg.dtype.type(a.count(axis))
578                                               else: 
...
608                                           
609         1            7      7.0      0.1      if returned:
610                                                   if scl.shape != avg.shape:
611                                                       scl = np.broadcast_to(scl, avg.shape).copy()
612                                                   return avg, scl
613                                               else:
614         1            5      5.0      0.1          return avg

我删除了一些不相关的行。

所以实际上30%的时间都花在np.ma.asarray上(arr.mean不必做的事情!

但是,如果使用更大的数组,则相对时间会发生巨大变化:

arr = np.full((1000, 1000), -9999, dtype=float)
%lprun -f np.ma.average np.ma.average(arr, axis=0)
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
519                                           def average(a, axis=None, weights=None, returned=False):
...
570         1          609    609.0      7.6      a = asarray(a)
571         1           14     14.0      0.2      m = getmask(a)
572                                           
573                                               # inspired by 'average' in numpy/lib/function_base.py
574                                           
575         1            7      7.0      0.1      if weights is None:
576         1         6924   6924.0     86.9          avg = a.mean(axis)
577         1          404    404.0      5.1          scl = avg.dtype.type(a.count(axis))
578                                               else:
...
609         1            6      6.0      0.1      if returned:
610                                                   if scl.shape != avg.shape:
611                                                       scl = np.broadcast_to(scl, avg.shape).copy()
612                                                   return avg, scl
613                                               else:
614         1            6      6.0      0.1          return avg

这次np.ma.MaskedArray.mean功能几乎占据了90%的时间。

注意:您还可以深入挖掘并查看np.ma.asarraynp.ma.MaskedArray.countnp.ma.MaskedArray.mean并检查它们的线路剖析。但我只是想表明有很多调用函数会增加开销。

那么下一个问题是:np.ndarray.meannp.ma.average之间的相对时间是否也发生了变化?至少在我的电脑上,现在的差异要小得多:

%timeit np.ma.average(arr, axis=0)
# 2.96 ms ± 91 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit arr.mean(axis=0)
# 1.84 ms ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

这次甚至没有慢2倍。我认为对于更大的阵列,差异会变得更小。


这也是NumPy实际上很常见的东西:

即使对于普通的 numpy 函数,常数因子也相当高(例如,请参阅我对"numpy 中不同矢量化方法的性能"问题的回答(。对于np.ma这些常量因子甚至更大,特别是如果您不使用np.ma.MaskedArray作为输入。但是,即使常量因子可能很高,这些函数在大数组中也表现出色。

感谢上面评论中的@WillemVanOnsem和@sascha

编辑:适用于小型数组,有关详细信息,请参阅接受的答案

  • 为避免屏蔽操作,请尝试缓慢

    mask = self.local_pos_history[:, 0] > -9
    local_pos_hist_masked = self.local_pos_history[mask]
    avg = local_pos_hist_masked.mean(axis=0)
    

    戴着面具的老

    mask = np.ma.masked_where(self.local_pos_history > -9, self.local_pos_history)
    local_pos_hist_mask = self.local_pos_history[mask].reshape(len(self.local_pos_history) // 3, 3)
    avg_pos = self.local_pos_history
    
  • NP.平均值几乎等于 arr.mean:

    %timeit np.average(arr, axis=0)
    The slowest run took 5.81 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 9.89 µs per loop
    %timeit np.mean(arr, axis=0)
    The slowest run took 6.44 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 8.74 µs per loop
    

只是为了澄清仍然是小批量测试