将 maxtrix 从 scipy.sparse.identity 分配给 csr_matrix



我想为scipy.sparse.csr_matrix的一部分分配一个大规模的scipy.sparse.identity,但未能这样做。在这种情况下,m = 25000000p=3.Tc_temp是大小25000000 x 75000000csr_matrix

Tc_temp = csr_matrix((m, p * m))
Tc_temp[0: m, np.arange(j, p * m + j, p)] = identity(m, format='csr')

我得到的错误回溯是:

Traceback (most recent call last):
File "C:Program FilesJetBrainsPyCharm Community Edition 2021.2pluginspython-cehelperspydev_pydevd_bundlepydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<input>", line 1, in <module>
File "C:UserskusariMiniconda3envscvxpy_envlibsite-packagesscipysparse_index.py", line 116, in __setitem__
self._set_arrayXarray_sparse(i, j, x)
File "C:UserskusariMiniconda3envscvxpy_envlibsite-packagesscipysparsecompressed.py", line 816, in _set_arrayXarray_sparse
self._zero_many(*self._swap((row, col)))
File "C:UserskusariMiniconda3envscvxpy_envlibsite-packagesscipysparsecompressed.py", line 932, in _zero_many
i, j, M, N = self._prepare_indices(i, j)
File "C:UserskusariMiniconda3envscvxpy_envlibsite-packagesscipysparsecompressed.py", line 882, in _prepare_indices
i = np.array(i, dtype=self.indices.dtype, copy=False, ndmin=1).ravel()
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 233. GiB for an array with shape (62500000000,) and data type int32

sparse.identity以某种方式转换为密集矩阵。

让我们检查一下较小矩阵的操作:

身份 - 采用 coo 格式:

In [67]: I = sparse.identity(10,format='coo')
In [68]: I.row
Out[68]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
In [69]: I.col
Out[69]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

"空白"企业社会责任:

In [70]: M = sparse.csr_matrix((10,30))
In [71]: M.indptr
Out[71]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)
In [72]: M.indices
Out[72]: array([], dtype=int32)

作业。 我在这里使用切片符号而不是您的arange,但效果是相同的(即使在时间上):

In [73]: M[0:10, 0:30:3] = I
/usr/local/lib/python3.8/dist-packages/scipy/sparse/_index.py:116: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
self._set_arrayXarray_sparse(i, j, x)

生成的矩阵:

In [74]: M.indptr
Out[74]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10], dtype=int32)
In [75]: M.indices
Out[75]: array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27], dtype=int32)

并查看对应的coo属性:

In [76]: M.tocoo().row
Out[76]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
In [77]: M.tocoo().col
Out[77]: array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27], dtype=int32)

rowI相同,而col只是您的arange索引:

In [78]: np.arange(0,30,3)
Out[78]: array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27])

因此,您可以使用以下命令创建相同的矩阵:

M1 = sparse.csr_matrix((np.ones(10),(np.arange(10), np.arange(0,30,3))),(10,30))

分配给稀疏矩阵效率不高。它会构建与插入内容大小相同的行/列索引。显然,在这种规模上是不可行的。

不过,您可以通过直接摆弄坐标矩阵中的数据来解决此问题,尽管它效率不高。

from scipy.sparse import csr_matrix, identity
import numpy as np
m = 25000000
p = 3
j = 0
Tc_temp = csr_matrix((m, p * m)).tocoo()
Tc_identity = identity(m, format='coo')
# If you know Tc_temp is already 0s where you want to do assignments, you can omit this
# It's gonna be slow if there's a lot of data in Tc_temp
Tc_zero_idx = np.isin(Tc_temp.row, Tc_identity.row) & np.isin(Tc_temp.col, Tc_identity.col * p)
Tc_temp.data[Tc_zero_idx] = 0
# Add the identity matrix to your data
Tc_temp.row = np.append(Tc_temp.row, Tc_identity.row)
Tc_temp.col = np.append(Tc_temp.col, Tc_identity.col * p)
Tc_temp.data = np.append(Tc_temp.data, Tc_identity.data)
Tc_temp.tocsr()

通常我会告诉你逐块构建它,但如果你试图交错行和列,这对你来说不是一个很好的选择。

最新更新