计算矩阵元素加权和的最快方法是什么

我目前有点惊讶这需要这么长时间。我想计算矩阵元素的总和，按它们到对角线的距离进行加权。方阵仅包含非负整数元素。

我的代码

#!/usr/bin/env python
"""Calculate a score for a square matrix."""
import random
random.seed(0)

def calculate_score(cm):
    """
    Calculate a score how close big elements of cm are to the diagonal.
    Examples
    --------
    >>> cm = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
    >>> calculate_score(cm)
    32
    """
    score = 0
    for i, line in enumerate(cm):
        for j, el in enumerate(line):
            score += el * abs(i - j)
    return score

def main(n):
    import time
    import numpy as np
    score_calculations = 10**3
    t = 0
    for step in range(score_calculations):
        cm = np.random.randint(0, 150000, size=(n, n))
        t0 = time.time()
        calculate_score(cm)
        t1 = time.time()
        t += (t1 - t0)
    print("{:0.2f} scores / sec".format(score_calculations / t))
if __name__ == '__main__':
    main(369)

分析

当前代码仅给出 32.47 分/秒。 kernprof -l -v main.py给出以下结果：

我试图循环元素本身(range(n)循环(，但这将速度降低到只有 20.02 分数/秒。

Wrote profile results to main.py.lprof
Timer unit: 1e-06 s
Total time: 109.124 s
File: main.py
Function: calculate_score at line 9
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     9                                           @profile
    10                                           def calculate_score(cm):
    11                                               """
    12                                               Calculate a score how close big elements of cm are to the diagonal.
    13                                           
    14                                               Examples
    15                                               --------
    16                                               >>> cm = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
    17                                               >>> calculate_score(cm)
    18                                               32
    19                                               """
    20      1000          619      0.6      0.0      score = 0
    21    370000       180693      0.5      0.2      for i, line in enumerate(cm):
    22 136530000     43691655      0.3     40.0          for j, el in enumerate(line):
    23 136161000     65250190      0.5     59.8              score += el * abs(i - j)
    24      1000          386      0.4      0.0      return score

我不确定是否有任何东西可以使其更快，因为代码似乎非常简单。

这是一种矢量化方法，使用 broadcasting 来计算这些weights，然后对这些sum-reductions使用带有np.tensordot的matrix-multiplication -

def calculate_score_vectorized(cm):    
    m,n = cm.shape 
    wghts = np.abs(np.arange(n) - np.arange(m)[:,None])
    return np.tensordot(cm,wghts, axes=((0,1),(0,1)))

sum-reduction的最后一步也可以用np.einsum计算 -

np.einsum('ij,ij',cm,wghts)

也只是通过逐元素乘法和求和 -

(cm*wghts).sum()

运行时测试 -

In [104]: n = 369
In [105]: cm = np.random.randint(0, 150000, size=(n, n))
In [106]: calculate_score(cm)
Out[106]: 1257948732168
In [107]: calculate_score_vectorized(cm)
Out[107]: array(1257948732168)
In [108]: %timeit calculate_score(cm)
10 loops, best of 3: 31.4 ms per loop
In [109]: %timeit calculate_score_vectorized(cm)
1000 loops, best of 3: 675 µs per loop
In [110]: 31400/675.0
Out[110]: 46.51851851851852

对于给定的数据集大小，46x+加速。

如注释中所述，如果输入数组的形状保持不变，我们可以保存权重wghts，并使用前面讨论的那些sum-reduction方法重用它们以进一步提升。

完整代码

#!/usr/bin/env python
"""Calculate a score for a square matrix."""
import random
random.seed(0)
import numpy as np

def calculate_score(cm, weights):
    """
    Calculate a score how close big elements of cm are to the diagonal.
    Examples
    --------
    >>> cm = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
    >>> weights = calculate_weight_matrix(3)
    >>> calculate_score(cm, weights)
    32
    """
    return int(np.tensordot(cm, weights, axes=((0, 1), (0, 1))))

def calculate_weight_matrix(n):
    """
    Calculate the weights for each position.
    The weight is the distance to the diagonal.
    """
    weights = np.abs(np.arange(n) - np.arange(n)[:, None])
    return weights

def measure_time(n):
    """Measure the time of calculate_score for n x n matrices."""
    import time
    import numpy as np
    score_calculations = 10**3
    t = 0
    weights = calculate_weight_matrix(n)
    for step in range(score_calculations):
        cm = np.random.randint(0, 150000, size=(n, n))
        t0 = time.time()
        calculate_score(cm, weights)
        t1 = time.time()
        t += (t1 - t0)
    print("{:0.2f} scores / sec".format(score_calculations / t))
if __name__ == '__main__':
    import doctest
    doctest.testmod()
    measure_time(369)

这给出了10044.31 scores / sec - 10381.71 scores / sec(之前：32.47 分数/秒(。这是 309× 的加速！

我的代码

分析

完整代码

相关内容

最新更新

热门标签：