计算 numpy 数组和csr_matrix之间的成对最小值的最有效方法

我有一个形状为(1, 1000)的 numpy 数组V。我还有一个形状为(100000, 1000)的csr_matrixM.对于M中的每一行m，我想计算V和m之间的成对最小值，并将所有结果存储在一个新矩阵中，我想有效地做到这一点。最终结果也应该是一个形状为(100000, 1000)的矩阵。

我考虑/尝试过的一些方法：

使用 for 循环遍历每一行M。这有效，但速度很慢。
将M转换为矩阵：numpy.minimum(V, M.toarray())这需要大量内存。
numpy.minimum(V, M)不起作用。我收到一个错误，上面写着：Comparing a sparse matrix with a scalar less than zero using >= is inefficient。

在不占用太多内存或时间的情况下做到这一点的好方法是什么？

如果v中的值是非负数，则这是一个简洁的方法，应该比遍历每一行要快得多：

import numpy as np
from scipy.sparse import csr_matrix
def rowmin(M, v):
# M must be a csr_matrix, and v must be a 1-d numpy array with
# length M.shape[1].  The values in v must be nonnegative.
if np.any(v < 0):
raise ValueError('v must not contain negative values.')
# B is a CSR matrix with the same sparsity pattern as M, but its
# data values are from v:
B = csr_matrix((v[M.indices], M.indices, M.indptr))
return M.minimum(B)

为了允许负值在v，此修改有效，但当v具有负值时会生成警告，因为当将负值复制到其中时，必须更改B中的稀疏性模式。(警告可以用另外几行代码来静音。v中的许多负值可能会显著降低性能。

def rowmin(M, v):
# M must be a csr_matrix, and v must be a 1-d numpy array with
# length M.shape[1].
# B is a CSR matrix with the same sparsity pattern as M, but its
# data values are from v:
B = csr_matrix((v[M.indices], M.indices, M.indptr))
# If there are negative values in v, include them in B.
negmask = v < 0
if np.any(negmask):
negindices = negmask.nonzero()[0]
B[:, negindices] = v[negindices]
return M.minimum(B)

相关内容

最新更新

热门标签：