Problem of ``sprandn'': the number of nonzero elements is inconsistent with density

2 views (last 30 days)
R = sprandn(m,n,density) is a random, m-by-n, sparse matrix with approximately density*m*n normally distributed nonzero entries (0 <= density <= 1).
However, if density is large, the number of nonzero entries it not consistent with density.
nnz(sprandn(100,10,0.3))
ans =
259
nnz(sprandn(100,10,0.7))
ans =
492
For density=0.7, the number of nonzero elements should be approximately 700, but the result is only 492.
This is an example, I have tried for many times but results are similar.
I apologize in advance if it is a trivial problem.

Accepted Answer

Bruno Luong
Bruno Luong on 31 Jul 2020
Edited: Bruno Luong on 31 Jul 2020
The approximation is only good when density << 1. I think SPRAND/SPRANDN simply create random draw of (density*m*n) pairs of (i,j) WITH replacement (perhaps for the reason of efficiency and memory).
When density specified is close to 1, there are a lot of pairs that collide in the same place and the effective density is less than what user specifies.
>> nnz(sprand(100,100,1))/100^2
ans =
0.6400
I believe the "defective ratio" is about
>> 1-1/exp(1)
ans =
0.6321
for density ~ 1 and large m*n
  2 Comments
Bruno Luong
Bruno Luong on 31 Jul 2020
You can check the fact that draw with replacement is used from the code of SPRAND
nnzwanted = round(m * n * min(density,1));
i = fix( rand(nnzwanted, 1) * m ) + 1;
j = fix( rand(nnzwanted, 1) * n ) + 1;
ij = unique([i j],'rows');
if ~isempty(ij)
i = ij(:,1);
j = ij(:,2);
end

Sign in to comment.

More Answers (1)

John D'Errico
John D'Errico on 31 Jul 2020
Edited: John D'Errico on 31 Jul 2020
Bruno has already explained why sprand/sprandn fail to produce the requested density of arrays. So this answer is here only to add some information. Honestly, I would think they might have done at least a little better job in those codes for matrices that are only semi-sparse. My guess is the author did not consider that anyone would want to generate sparse matrices that are even remotely close to full.
I would add that you really don't have any rational need for high density sparse matrices. That is, a sparse matrix with a density even as large as 50% will be slower in terms of computation. It will take more memory to store.
Af = randn(5000);
Af(rand(size(A)) > 0.5) = 0;
As = sparse(Af);
whos Af As
Name Size Bytes Class Attributes
Af 5000x5000 200000000 double
As 5000x5000 200048280 double sparse
nnz(As)/numel(As)
ans =
0.50002068
This, by the way, has one virtue when computing a semi-sparse matrix, in the sense that it does produce a matrix with a very good approximation of the requested density. A problem of course, is I created a full matrix, and then discarded half of the elements. Not only that, but the way I chose the elements to discard forced me to create a temporary second full matrix of that size. So the method I used above is arguably a poor one. But a more careful sampling scheme would logically have been possible.
Regardless, the sparse version (As) takes more memory to store then the corresponding full matrix (Af). This happens because a sparse matrix needs to store not only the non-zero elements, but also the locations of those elements. And that too requires memory.
Are these quasi-sparse matrices faster to use? Often not the case.
timeit(@() lu(Af))
ans =
0.457781138132
timeit(@() lu(As))
ans =
48.439253306132
timeit(@() Af*Af)
ans =
0.864906439132
timeit(@() As*As)
ans =
32.712310654132
Sparse matrices are splendid tools, when your matrix truly is sparse. But when it is only 50% sparse or even anything in that vicinity, that sparsity becomes a cost, not a gain.

Categories

Find more on Sparse Matrices in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!