有人能解释为什么我会得到错误'nz的结果太大'以及如何解决它吗?
import numpy as np
from scipy.sparse import csc_matrix
row_idx = np.random.randint(0, 19380, 430097996, dtype= np.uint64)
col_idx = np.random.randint(0, 137000, 430097996, dtype= np.uint64)
values = np.ones(430097996, dtype= np.uint64)
random_p= csc_matrix((values, (row_idx, col_idx)), dtype=np.uint64 )
shape1=(137000, 19380)
nnz1 = 700969
row_idx = np.random.randint(0, shape1[0], nnz1, dtype= np.uint64)
col_idx = np.random.randint(0, shape1[1], nnz1, dtype= np.uint64)
values = np.ones(nnz1, dtype= np.uint64)
random_tc= csc_matrix((values, (row_idx, col_idx)), dtype=np.uint64)
random_tc*random_p
此代码导致以下错误:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-1-86c7a6e80653> in <module>
12 values = np.ones(nnz1, dtype= np.uint64)
13 random_tc= csc_matrix((values, (row_idx, col_idx)), dtype=np.uint64)
---> 14 random_tc*random_p
~/Library/Python/3.8/lib/python/site-packages/scipy/sparse/base.py in __mul__(self, other)
478 if self.shape[1] != other.shape[0]:
479 raise ValueError('dimension mismatch')
--> 480 return self._mul_sparse_matrix(other)
481
482 # If it's a list or whatever, treat it like a matrix
~/Library/Python/3.8/lib/python/site-packages/scipy/sparse/compressed.py in _mul_sparse_matrix(self, other)
503
504 fn = getattr(_sparsetools, self.format + '_matmat_maxnnz')
--> 505 nnz = fn(M, N,
506 np.asarray(self.indptr, dtype=idx_dtype),
507 np.asarray(self.indices, dtype=idx_dtype),
RuntimeError: nnz of the result is too large
这建议使用Integer,但这并没有解决问题。我还尝试了其他类型的稀疏矩阵,如bsr_matrix和coo_matrix,但问题仍然存在。
# Create matrices as above
import numpy as np
from scipy.sparse import csc_matrix
row_idx = np.random.randint(0, 19380, 430097996, dtype=np.int64)
col_idx = np.random.randint(0, 137000, 430097996, dtype=np.int64)
values = np.ones(430097996, dtype= np.float32)
random_p= csc_matrix((values, (row_idx, col_idx)), dtype=np.int64 )
shape1=(137000, 19380)
nnz1 = 700969
row_idx = np.random.randint(0, shape1[0], nnz1, dtype= np.int64)
col_idx = np.random.randint(0, shape1[1], nnz1, dtype= np.int64)
values = np.ones(nnz1, dtype= np.float32)
random_tc= csc_matrix((values, (row_idx, col_idx)), dtype=np.int64)
我将使用这个包与intel数学内核库接口(如果您还没有安装,也需要安装它(。MKL稀疏函数确实是很好的IMO,使用它们来代替一些scipy函数有一些很大的优势。缺点是它只进行浮点运算,而不进行整数运算。
# Set MKL interface layer to int64
# This must be set prior to importing the package
import os
os.environ["MKL_INTERFACE_LAYER"] = "ILP64"
现在只需要调用函数。
# Import and multiply
from sparse_dot_mkl import dot_product_mkl
result = dot_product_mkl(random_tc, random_p)
In [14]: nnz=200
...: row_idx = np.random.randint(0, 19380, nnz, dtype= np.uint64)
...: col_idx = np.random.randint(0, 137000, nnz, dtype= np.uint64)
...: values = np.ones(nnz, dtype= np.uint64)
...: random_p= sparse.csc_matrix((values, (row_idx, col_idx)), dtype=np.uint64 )
...:
In [15]: random_p
Out[15]:
<19171x136942 sparse matrix of type '<class 'numpy.uint64'>'
with 200 stored elements in Compressed Sparse Column format>
In [16]: random_p.indptr
Out[16]: array([ 0, 0, 0, ..., 199, 199, 200], dtype=int32)
指定dtype
只会与data
有所不同。indptr
和indices
是由形状决定的,而不是由输入决定的(至少不是这些coo
风格的(。
如果我从csc
风格的输入中制作矩阵,这甚至没有帮助:
mat = sparse.csc_matrix((random_p.data, random_p.indices.astype('int64'), random_p.indptr.astype('in
...: t64')))
link
对数据类型进行就地更改
In [28]: random_p.indptr = random_p.indptr.astype('int64')
In [29]: random_p.indices = random_p.indices.astype('int64')
乘法有效,尽管结果仍然是int32
:
In [32]: random_p@random_p.T
Out[32]:
<19377x19377 sparse matrix of type '<class 'numpy.int64'>'
with 198 stored elements in Compressed Sparse Column format>
In [33]: _.indptr
Out[33]: array([ 0, 0, 0, ..., 197, 197, 198], dtype=int32)
我不打算尝试大型nnz
。
就其价值而言,我的scipy
是1.15.2。我不知道链接中提到的修复程序是否存在。