运行时错误:结果的nnz太大



有人能解释为什么我会得到错误'nz的结果太大'以及如何解决它吗?

import numpy as np
from scipy.sparse import csc_matrix
row_idx = np.random.randint(0, 19380, 430097996, dtype= np.uint64)
col_idx = np.random.randint(0,  137000, 430097996, dtype= np.uint64)
values = np.ones(430097996, dtype= np.uint64)
random_p= csc_matrix((values, (row_idx, col_idx)), dtype=np.uint64 )
shape1=(137000, 19380)
nnz1 =  700969
row_idx = np.random.randint(0, shape1[0], nnz1,  dtype= np.uint64)
col_idx = np.random.randint(0,  shape1[1], nnz1, dtype= np.uint64)
values = np.ones(nnz1, dtype= np.uint64)
random_tc= csc_matrix((values, (row_idx, col_idx)), dtype=np.uint64)
random_tc*random_p

此代码导致以下错误:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-86c7a6e80653> in <module>
12 values = np.ones(nnz1, dtype= np.uint64)
13 random_tc= csc_matrix((values, (row_idx, col_idx)), dtype=np.uint64)
---> 14 random_tc*random_p
~/Library/Python/3.8/lib/python/site-packages/scipy/sparse/base.py in __mul__(self, other)
478             if self.shape[1] != other.shape[0]:
479                 raise ValueError('dimension mismatch')
--> 480             return self._mul_sparse_matrix(other)
481 
482         # If it's a list or whatever, treat it like a matrix
~/Library/Python/3.8/lib/python/site-packages/scipy/sparse/compressed.py in _mul_sparse_matrix(self, other)
503 
504         fn = getattr(_sparsetools, self.format + '_matmat_maxnnz')
--> 505         nnz = fn(M, N,
506                  np.asarray(self.indptr, dtype=idx_dtype),
507                  np.asarray(self.indices, dtype=idx_dtype),
RuntimeError: nnz of the result is too large

这建议使用Integer,但这并没有解决问题。我还尝试了其他类型的稀疏矩阵,如bsr_matrix和coo_matrix,但问题仍然存在。

# Create matrices as above
import numpy as np
from scipy.sparse import csc_matrix
row_idx = np.random.randint(0, 19380, 430097996, dtype=np.int64)
col_idx = np.random.randint(0,  137000, 430097996, dtype=np.int64)
values = np.ones(430097996, dtype= np.float32)
random_p= csc_matrix((values, (row_idx, col_idx)), dtype=np.int64 )
shape1=(137000, 19380)
nnz1 =  700969
row_idx = np.random.randint(0, shape1[0], nnz1,  dtype= np.int64)
col_idx = np.random.randint(0,  shape1[1], nnz1, dtype= np.int64)
values = np.ones(nnz1, dtype= np.float32)
random_tc= csc_matrix((values, (row_idx, col_idx)), dtype=np.int64)

我将使用这个包与intel数学内核库接口(如果您还没有安装,也需要安装它(。MKL稀疏函数确实是很好的IMO,使用它们来代替一些scipy函数有一些很大的优势。缺点是它只进行浮点运算,而不进行整数运算。

# Set MKL interface layer to int64
# This must be set prior to importing the package
import os
os.environ["MKL_INTERFACE_LAYER"] = "ILP64"

现在只需要调用函数。

# Import and multiply
from sparse_dot_mkl import dot_product_mkl
result = dot_product_mkl(random_tc, random_p)
In [14]: nnz=200
...: row_idx = np.random.randint(0, 19380, nnz, dtype= np.uint64)
...: col_idx = np.random.randint(0,  137000, nnz, dtype= np.uint64)
...: values = np.ones(nnz, dtype= np.uint64)
...: random_p= sparse.csc_matrix((values, (row_idx, col_idx)), dtype=np.uint64 )
...: 
In [15]: random_p
Out[15]: 
<19171x136942 sparse matrix of type '<class 'numpy.uint64'>'
with 200 stored elements in Compressed Sparse Column format>
In [16]: random_p.indptr
Out[16]: array([  0,   0,   0, ..., 199, 199, 200], dtype=int32)

指定dtype只会与data有所不同。indptrindices是由形状决定的,而不是由输入决定的(至少不是这些coo风格的(。

如果我从csc风格的输入中制作矩阵,这甚至没有帮助:

mat = sparse.csc_matrix((random_p.data, random_p.indices.astype('int64'), random_p.indptr.astype('in
...: t64')))

link对数据类型进行就地更改

In [28]: random_p.indptr = random_p.indptr.astype('int64')
In [29]: random_p.indices = random_p.indices.astype('int64')

乘法有效,尽管结果仍然是int32:

In [32]: random_p@random_p.T
Out[32]: 
<19377x19377 sparse matrix of type '<class 'numpy.int64'>'
with 198 stored elements in Compressed Sparse Column format>
In [33]: _.indptr
Out[33]: array([  0,   0,   0, ..., 197, 197, 198], dtype=int32)

我不打算尝试大型nnz

就其价值而言,我的scipy是1.15.2。我不知道链接中提到的修复程序是否存在。

最新更新