如何在Python中对角线稀疏的CSR 1d-Matrix(Vector)



[简短版本]

scipy.sparse中是否有相当于numpy.diagflat((?或者有什么方法使稀疏的矩阵变得密集?

[长版]

我有一个稀疏的矩阵(数学上是一个向量(x_f,我需要对角(即创建一个具有x_f vector值的正方形矩阵(。

>
x_f
Out[59]: 
<35021x1 sparse matrix of type '<class 'numpy.float64'>'
    with 47 stored elements in Compressed Sparse Row format>

我尝试了scipy.sparse模块的"诊断"。(我也尝试了" spdiags',但它只是" diags"的更奇特的版本,我不需要。(我尝试了[CSR或CSC格式的每种组合],[原始或转移vector]和[.todense((或.toArray((],但我一直遇到错误:

ValueError: Different number of diagonals and offsets.

使用稀疏。diags默认偏移量为0,而我要做的是唯一的在主角上添加数字(默认值为默认值(,因此获得此错误意味着它是不做我想要的工作。

以下是带有.todense((和.toArray((((((的原始矢量和转移矢量的示例:

x_f_original.todense()
Out[72]: 
matrix([[  0.00000000e+00],
        [  0.00000000e+00],
        [  0.00000000e+00],
        ..., 
        [  0.00000000e+00],
        [  1.03332178e-17],
        [  0.00000000e+00]])
x_f_transposed.toarray()
Out[83]: 
array([[  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
          0.00000000e+00,   1.03332178e-17,   0.00000000e+00]])

以下代码有效,但运行大约需要15秒:

x_f_diag = sparse.csc_matrix(np.diagflat(x_f.todense()))

有人对如何使其更有效或仅仅是一种更好的方法有任何想法吗?

[免责声明]

这是我在这里的第一个问题。我希望我做对了,并为不清楚的任何事情道歉。

In [106]: x_f = sparse.random(1000,1, .1, 'csr')
In [107]: x_f
Out[107]: 
<1000x1 sparse matrix of type '<class 'numpy.float64'>'
    with 100 stored elements in Compressed Sparse Row format>

如果将其变成1D密集的数组,我可以在sparse.diags中使用它。

In [108]: M1=sparse.diags(x_f.A.ravel()).tocsr()
In [109]: M1
Out[109]: 
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 100 stored elements in Compressed Sparse Row format>

或我可以将其制作为(1,1000(矩阵,并使用列表作为偏移:

In [110]: M2=sparse.diags(x_f.T.A,[0]).tocsr()
In [111]: M2
Out[111]: 
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 100 stored elements in Compressed Sparse Row format>

diags采用密集的对角线,而不是稀疏。这是按原样存储的,所以我使用了更多的.tocsr删除0等。

In [113]: sparse.diags(x_f.T.A,[0])
Out[113]: 
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 1000 stored elements (1 diagonals) in DIAgonal format>

因此,无论哪种方式,我都将对角线的形状与偏移数(标量或1(匹配。

直接映射到csr(或csc(可能更快。

使用此列形状,indices属性不会告诉我们任何东西。

In [125]: x_f.indices
Out[125]: 
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...0, 0, 0], dtype=int32)

,但将其转换为csc(这将indptr映射到indices(

In [126]: x_f.tocsc().indices
Out[126]: 
array([  2,  15,  26,  32,  47,  56,  75,  82,  96,  99, 126, 133, 136,
       141, 145, 149, ... 960, 976], dtype=int32)
In [127]: idx=x_f.tocsc().indices
In [128]: M3 = sparse.csr_matrix((x_f.data, (idx, idx)),(1000,1000))
In [129]: M3
Out[129]: 
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 100 stored elements in Compressed Sparse Row format>

您可以使用以下构造函数:

csr_matrix((data, (row_ind, col_ind)), [shape=(M, N)])

其中datarow_indcol_ind满足关系 a[row_ind[k], col_ind[k]] = data[k]

演示(COO矩阵(:

from scipy.sparse import random, csr_matrix, coo_matrix
In [142]: M = random(10000, 1, .005, 'coo')
In [143]: M
Out[143]:
<10000x1 sparse matrix of type '<class 'numpy.float64'>'
        with 50 stored elements in COOrdinate format>
In [144]: M2 = coo_matrix((M.data, np.diag_indices(len(M.data))), (len(M.data), len(M.data)))
In [145]: M2
Out[145]:
<50x50 sparse matrix of type '<class 'numpy.float64'>'
        with 50 stored elements in COOrdinate format>
In [146]: M2.todense()
Out[146]:
matrix([[ 0.1559936 ,  0.        ,  0.        , ...,  0.        ,  0.        ,  0.        ],
        [ 0.        ,  0.28984266,  0.        , ...,  0.        ,  0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.21381431, ...,  0.        ,  0.        ,  0.        ],
        ...,
        [ 0.        ,  0.        ,  0.        , ...,  0.23100531,  0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,  0.13789309,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,  0.        ,  0.73827   ]])

演示(CSR矩阵(:

In [112]: from scipy.sparse import random, csr_matrix
In [113]: M = random(10000, 1, .005, 'csr')
In [114]: M
Out[114]:
<10000x1 sparse matrix of type '<class 'numpy.float64'>'
        with 50 stored elements in Compressed Sparse Row format>
In [137]: M2 = csr_matrix((M.data, np.diag_indices(len(M.data))), (len(M.data), len(M.data)))
In [138]: M2
Out[138]:
<50x50 sparse matrix of type '<class 'numpy.float64'>'
        with 50 stored elements in Compressed Sparse Row format>
In [139]: M2.todense()
Out[139]:
matrix([[ 0.45661992,  0.        ,  0.        , ...,  0.        ,  0.        ,  0.        ],
        [ 0.        ,  0.42428401,  0.        , ...,  0.        ,  0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.99484544, ...,  0.        ,  0.        ,  0.        ],
        ...,
        [ 0.        ,  0.        ,  0.        , ...,  0.80880579,  0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,  0.46292628,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,  0.        ,  0.56363196]])

如果您需要密集的矩阵:

In [147]: np.diagflat(M.data)
Out[147]:
array([[ 0.1559936 ,  0.        ,  0.        , ...,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.28984266,  0.        , ...,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.21381431, ...,  0.        ,  0.        ,  0.        ],
       ...,
       [ 0.        ,  0.        ,  0.        , ...,  0.23100531,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,  0.13789309,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,  0.        ,  0.73827   ]])

最新更新