我正在处理大型稀疏矩阵(sparse)。我有一个很大的值的稀疏矩阵它需要包含到一个更大的稀疏矩阵中。我有一个logicals
的数组,它表示哪些行和哪些列将被较小的矩阵值填充。在这个应用程序中,两个矩阵中较小的一个是作为邻接矩阵存储的图,逻辑表示节点id在较大矩阵中的位置。
一个小的玩具示例来演示我当前正在做的事情:
zz = sparse(4,4); %create a sparse matrix of the final size desired
rr = rand(3,3); %a matrix of values
logical_inds = logical([1 0 1 1]); %a logical array to index with the dimension size of zz
zz(logical_inds,logical_inds) = rr(:,:) %'rr' values are mapped into the subset of 'zz'
我看到zz
的2和列是零,并且2和行也是零值。这是期望的输出。
得到"稀疏索引可能很慢"的警告,事实确实如此。有时,当矩阵非常大时,程序在这一行终止。
我如何用稀疏方法创建这个矩阵(zz
) ?我不确定如何从我拥有的逻辑掩码中创建行列索引,以及如何将rr
的值转换为适合这个新索引的数组。
**通常rr
是非常稀疏的,尽管逻辑掩码地址是整个矩阵
我认为这个问题主要是由于在分配过程中隐式调整大小。以下是我认为的原因:
%# test parameters
N = 5000; %# Size of 1 dimension of the square sparse
L = rand(1,N) > 0.95; %# 5% of rows/cols will be non-zero values
M = sum(L);
rr = rand(M); %# the "data" to fill the sparse with
%# Method 1: direct logical indexing
%# (your original method)
zz1 = sparse(N,N);
tic
zz1(L,L) = rr;
toc
%# Method 2: test whether the conversion to logical col/row indices matters
zz2 = sparse(N,N);
inds = zz1~=0;
tic
zz2(inds) = rr;
toc
%# Method 3: test whether the conversion to linear indices matters
zz3 = sparse(N,N);
inds = find(inds);
tic
zz3(inds) = rr;
toc
%# Method 4: test whether implicit resizing matters
zz4 = spalloc(N,N, M*M);
tic
zz4(inds) = rr;
toc
结果:
Elapsed time is 3.988558 seconds. %# meh M1 (original)
Elapsed time is 3.916462 seconds. %# meh M2 (expanded logicals)
Elapsed time is 4.003222 seconds. %# meh M3 (converted row/col indices)
Elapsed time is 0.139986 seconds. %# WOW! M4 (pre-allocated memory)
所以显然(和令人惊讶的),似乎MATLAB不会在分配之前增长现有的稀疏(如您所期望的),但实际上循环通过行/col索引并在迭代期间增长稀疏。因此,似乎必须"帮助"MATLAB一点:
%# Create initial sparse
zz1 = sparse(N,N);
%# ...
%# Do any further operations until you can create rr:
%# ...
rr = rand(M); %# the "data" to fill the sparse with
%# Now that the size of the data is known, re-allocate space for the sparse:
tic
[i,j] = find(zz1); %# indices
[m,n] = size(zz1); %# Sparse size (you can also use N of course)
zz1 = sparse(i,j,nonzeros(zz1), m,n, M*M);
zz1(L,L) = rr; %# logical or integer indices, doesn't really matter
toc
结果(对于相同的N
, L
和rr
):
Elapsed time is 0.034950 seconds. %# order of magnitude faster than even M4!
要用sparse
函数创建这个矩阵,逻辑索引将需要转换为行和列索引,所以这可能最终会变慢…
在这里找到逻辑向量中1的位置,然后创建一个矩阵,其中包含稀疏矩阵中非零的行和列索引。
最后,使用稀疏函数在这些位置创建具有rr
元素的稀疏矩阵(使用rr(:)
将其转换为列向量)
ind_locs = find(logical_inds);
ind = combvec(ind_locs,ind_locs);
zz = sparse(ind(1,:),ind(2,:),rr(:))