Problem of ``sprandn'': the number of nonzero elements is inconsistent with density

Question

Zenan Li on 31 Jul 2020

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/573241-problem-of-sprandn-the-number-of-nonzero-elements-is-inconsistent-with-density

Edited: Bruno Luong on 31 Jul 2020

R = sprandn(m,n,density) is a random, m-by-n, sparse matrix with approximately density*m*n normally distributed nonzero entries (0 <= density <= 1).

However, if density is large, the number of nonzero entries it not consistent with density.

nnz(sprandn(100,10,0.3))
ans = 
        259
nnz(sprandn(100,10,0.7))
ans = 
        492

For density=0.7, the number of nonzero elements should be approximately 700, but the result is only 492.

This is an example, I have tried for many times but results are similar.

I apologize in advance if it is a trivial problem.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Bruno Luong on 31 Jul 2020

2
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/573241-problem-of-sprandn-the-number-of-nonzero-elements-is-inconsistent-with-density#answer_473302

Edited: Bruno Luong on 31 Jul 2020

Open in MATLAB Online

The approximation is only good when density << 1. I think SPRAND/SPRANDN simply create random draw of (density*m*n) pairs of (i,j) WITH replacement (perhaps for the reason of efficiency and memory).

When density specified is close to 1, there are a lot of pairs that collide in the same place and the effective density is less than what user specifies.

>> nnz(sprand(100,100,1))/100^2
ans =
    0.6400

I believe the "defective ratio" is about

>> 1-1/exp(1)
ans =
    0.6321

for density ~ 1 and large m*n

2 Comments
Show NoneHide None

Zenan Li on 31 Jul 2020

Thank you, I agree it.

Bruno Luong on 31 Jul 2020

Open in MATLAB Online

You can check the fact that draw with replacement is used from the code of SPRAND

   nnzwanted = round(m * n * min(density,1));
   i = fix( rand(nnzwanted, 1) * m ) + 1;
   j = fix( rand(nnzwanted, 1) * n ) + 1;
   ij = unique([i j],'rows');
   if ~isempty(ij)
      i = ij(:,1);
      j = ij(:,2);
   end

Sign in to comment.

Answer 2

John D'Errico on 31 Jul 2020

1
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/573241-problem-of-sprandn-the-number-of-nonzero-elements-is-inconsistent-with-density#answer_473365

Edited: John D'Errico on 31 Jul 2020

Open in MATLAB Online

Bruno has already explained why sprand/sprandn fail to produce the requested density of arrays. So this answer is here only to add some information. Honestly, I would think they might have done at least a little better job in those codes for matrices that are only semi-sparse. My guess is the author did not consider that anyone would want to generate sparse matrices that are even remotely close to full.

I would add that you really don't have any rational need for high density sparse matrices. That is, a sparse matrix with a density even as large as 50% will be slower in terms of computation. It will take more memory to store.

Af = randn(5000);
Af(rand(size(A)) > 0.5) = 0;
As = sparse(Af);
whos Af As
  Name         Size                  Bytes  Class     Attributes
  Af        5000x5000            200000000  double         
  As        5000x5000            200048280  double    sparse       
nnz(As)/numel(As)
ans =
                0.50002068

This, by the way, has one virtue when computing a semi-sparse matrix, in the sense that it does produce a matrix with a very good approximation of the requested density. A problem of course, is I created a full matrix, and then discarded half of the elements. Not only that, but the way I chose the elements to discard forced me to create a temporary second full matrix of that size. So the method I used above is arguably a poor one. But a more careful sampling scheme would logically have been possible.

Regardless, the sparse version (As) takes more memory to store then the corresponding full matrix (Af). This happens because a sparse matrix needs to store not only the non-zero elements, but also the locations of those elements. And that too requires memory.

Are these quasi-sparse matrices faster to use? Often not the case.

timeit(@() lu(Af))
ans =
            0.457781138132
            
timeit(@() lu(As))
ans =
           48.439253306132
           
timeit(@() Af*Af)
ans =
            0.864906439132
timeit(@() As*As)
ans =
           32.712310654132

Sparse matrices are splendid tools, when your matrix truly is sparse. But when it is only 50% sparse or even anything in that vicinity, that sparsity becomes a cost, not a gain.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Problem of ``sprandn'': the number of nonzero elements is inconsistent with density

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

Problem of ``sprandn'': the number of nonzero elements is inconsistent with density

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments