[简短版本]
scipy.sparse中是否有相当于numpy.diagflat((?或者有什么方法使稀疏的矩阵变得密集?
[长版]
我有一个稀疏的矩阵(数学上是一个向量(x_f,我需要对角(即创建一个具有x_f vector值的正方形矩阵(。
>x_f
Out[59]:
<35021x1 sparse matrix of type '<class 'numpy.float64'>'
with 47 stored elements in Compressed Sparse Row format>
我尝试了scipy.sparse模块的"诊断"。(我也尝试了" spdiags',但它只是" diags"的更奇特的版本,我不需要。(我尝试了[CSR或CSC格式的每种组合],[原始或转移vector]和[.todense((或.toArray((],但我一直遇到错误:
ValueError: Different number of diagonals and offsets.
使用稀疏。diags默认偏移量为0,而我要做的是唯一的在主角上添加数字(默认值为默认值(,因此获得此错误意味着它是不做我想要的工作。
以下是带有.todense((和.toArray((((((的原始矢量和转移矢量的示例:
x_f_original.todense()
Out[72]:
matrix([[ 0.00000000e+00],
[ 0.00000000e+00],
[ 0.00000000e+00],
...,
[ 0.00000000e+00],
[ 1.03332178e-17],
[ 0.00000000e+00]])
x_f_transposed.toarray()
Out[83]:
array([[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
0.00000000e+00, 1.03332178e-17, 0.00000000e+00]])
以下代码有效,但运行大约需要15秒:
x_f_diag = sparse.csc_matrix(np.diagflat(x_f.todense()))
有人对如何使其更有效或仅仅是一种更好的方法有任何想法吗?
[免责声明]
这是我在这里的第一个问题。我希望我做对了,并为不清楚的任何事情道歉。
In [106]: x_f = sparse.random(1000,1, .1, 'csr')
In [107]: x_f
Out[107]:
<1000x1 sparse matrix of type '<class 'numpy.float64'>'
with 100 stored elements in Compressed Sparse Row format>
如果将其变成1D密集的数组,我可以在sparse.diags
中使用它。
In [108]: M1=sparse.diags(x_f.A.ravel()).tocsr()
In [109]: M1
Out[109]:
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
with 100 stored elements in Compressed Sparse Row format>
或我可以将其制作为(1,1000(矩阵,并使用列表作为偏移:
In [110]: M2=sparse.diags(x_f.T.A,[0]).tocsr()
In [111]: M2
Out[111]:
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
with 100 stored elements in Compressed Sparse Row format>
diags
采用密集的对角线,而不是稀疏。这是按原样存储的,所以我使用了更多的.tocsr
删除0等。
In [113]: sparse.diags(x_f.T.A,[0])
Out[113]:
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
with 1000 stored elements (1 diagonals) in DIAgonal format>
因此,无论哪种方式,我都将对角线的形状与偏移数(标量或1(匹配。
直接映射到csr
(或csc
(可能更快。
使用此列形状,indices
属性不会告诉我们任何东西。
In [125]: x_f.indices
Out[125]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...0, 0, 0], dtype=int32)
,但将其转换为csc
(这将indptr
映射到indices
(
In [126]: x_f.tocsc().indices
Out[126]:
array([ 2, 15, 26, 32, 47, 56, 75, 82, 96, 99, 126, 133, 136,
141, 145, 149, ... 960, 976], dtype=int32)
In [127]: idx=x_f.tocsc().indices
In [128]: M3 = sparse.csr_matrix((x_f.data, (idx, idx)),(1000,1000))
In [129]: M3
Out[129]:
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
with 100 stored elements in Compressed Sparse Row format>
您可以使用以下构造函数:
csr_matrix((data, (row_ind, col_ind)), [shape=(M, N)])
其中
data
,row_ind
和col_ind
满足关系a[row_ind[k], col_ind[k]] = data[k]
。
演示(COO矩阵(:
from scipy.sparse import random, csr_matrix, coo_matrix
In [142]: M = random(10000, 1, .005, 'coo')
In [143]: M
Out[143]:
<10000x1 sparse matrix of type '<class 'numpy.float64'>'
with 50 stored elements in COOrdinate format>
In [144]: M2 = coo_matrix((M.data, np.diag_indices(len(M.data))), (len(M.data), len(M.data)))
In [145]: M2
Out[145]:
<50x50 sparse matrix of type '<class 'numpy.float64'>'
with 50 stored elements in COOrdinate format>
In [146]: M2.todense()
Out[146]:
matrix([[ 0.1559936 , 0. , 0. , ..., 0. , 0. , 0. ],
[ 0. , 0.28984266, 0. , ..., 0. , 0. , 0. ],
[ 0. , 0. , 0.21381431, ..., 0. , 0. , 0. ],
...,
[ 0. , 0. , 0. , ..., 0.23100531, 0. , 0. ],
[ 0. , 0. , 0. , ..., 0. , 0.13789309, 0. ],
[ 0. , 0. , 0. , ..., 0. , 0. , 0.73827 ]])
演示(CSR矩阵(:
In [112]: from scipy.sparse import random, csr_matrix
In [113]: M = random(10000, 1, .005, 'csr')
In [114]: M
Out[114]:
<10000x1 sparse matrix of type '<class 'numpy.float64'>'
with 50 stored elements in Compressed Sparse Row format>
In [137]: M2 = csr_matrix((M.data, np.diag_indices(len(M.data))), (len(M.data), len(M.data)))
In [138]: M2
Out[138]:
<50x50 sparse matrix of type '<class 'numpy.float64'>'
with 50 stored elements in Compressed Sparse Row format>
In [139]: M2.todense()
Out[139]:
matrix([[ 0.45661992, 0. , 0. , ..., 0. , 0. , 0. ],
[ 0. , 0.42428401, 0. , ..., 0. , 0. , 0. ],
[ 0. , 0. , 0.99484544, ..., 0. , 0. , 0. ],
...,
[ 0. , 0. , 0. , ..., 0.80880579, 0. , 0. ],
[ 0. , 0. , 0. , ..., 0. , 0.46292628, 0. ],
[ 0. , 0. , 0. , ..., 0. , 0. , 0.56363196]])
如果您需要密集的矩阵:
In [147]: np.diagflat(M.data)
Out[147]:
array([[ 0.1559936 , 0. , 0. , ..., 0. , 0. , 0. ],
[ 0. , 0.28984266, 0. , ..., 0. , 0. , 0. ],
[ 0. , 0. , 0.21381431, ..., 0. , 0. , 0. ],
...,
[ 0. , 0. , 0. , ..., 0.23100531, 0. , 0. ],
[ 0. , 0. , 0. , ..., 0. , 0.13789309, 0. ],
[ 0. , 0. , 0. , ..., 0. , 0. , 0.73827 ]])