Preallocating a sparse matrix, then entering values column by column takes too much time. Is there a more efficient way ?

Question

Cem Gormezano on 4 Aug 2020

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/574948-preallocating-a-sparse-matrix-then-entering-values-column-by-column-takes-too-much-time-is-there-a

Answered: James Tursa on 4 Aug 2020

At the beginning of the loop, I preallocate the sparse matrix (Q) with a number of non zero values that I do not exceed within the loop. Then, I fill the columns of this sparse matrix one by one in each iteration of the loop. I cannot build this matrix Q at the end of the loop because I use a portion of this matrix within the loop in a matrix-vector product. The bottleneck in my code is the line with "Q(1:m,1:j-1) = sparse(Qrows(ptr_cols'), Qcols(ptr_cols'), Qvals(ptr_cols'), m, j-1);". Though it takes an immense amount of time, I have seen this method of assigning elements to a preallocated sparse matrix in MATLAB's page for spalloc https://www.mathworks.com/help/matlab/ref/spalloc.html . Thank you and best wishes !!!

Q = spalloc(m,n,nzPQ);
for j = 1:n % Loop over coloumns of A.
    
    if (j == 1)
        % Find the elements of q1 that match the sparsity pattern of 1st coloumn of PQ.
        ind_nzpq = find(A(:,1) ~= 0 & PQ(:,1) ~= 0); np = size(ind_nzpq,1);
        r11 = norm(A(ind_nzpq,1));
        if (r11 == 0)
            fprintf('A is rank deficient or you need more fill in. :(\n')
            return;
        end
        % Normalize these elements in q1 and store them in the COO structure of Q.
        Qvals(1:np) = A(ind_nzpq,1)/r11; Qcols(1:np) = 1; Qrows(1:np) = ind_nzpq;
        % Store the normalization constant in COO structure of R.
        Rvals(1) = r11; Rcols(1) = 1; Rrows(1) = 1;
        % Set the non-zero element count in Q and R.
        nzQ = np; nzR = 1;
        continue
    end
    
    q = A(:,j);
    ind_nzpr = find(PR(1:j-1,j));
    % Find the adress of coloumns of Q that correspond to the sparsity pattern of j-th coloumn of PR.
    ptr_cols = ismember(Qcols,ind_nzpr);
    % Construct a temporary sparse matrix that will store these coloumns qi.
    Q(1:m,1:j-1) = sparse(Qrows(ptr_cols'), Qcols(ptr_cols'), Qvals(ptr_cols'), m, j-1);
    r = Q'*q; % Project j-th coloumn of A on qi's.
    
    % ........ REST IS NOT NECESSARY.
  end

3 Comments
Show 1 older commentHide 1 older comment

Cem Gormezano on 4 Aug 2020

That is sad :(. Is there a fast MATLAB routine that takes two sparse matrices in [row, col, val] format and multiplies them without building them using sparse([row, col, val],m,n) ?

Bruno Luong on 4 Aug 2020

No. Just do the standard workflow and you'll be fine.

Your attempt of twist and turn won't make anything better, easier or faster.

Sign in to comment.

Sign in to answer this question.

Answer 1

James Tursa on 4 Aug 2020

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/574948-preallocating-a-sparse-matrix-then-entering-values-column-by-column-takes-too-much-time-is-there-a#answer_475153

The problem is that every time you change the elements of Q, even if it is only one element change, MATLAB generally has to copy all of the data elements. So there is potentially a lot of redundant data copying going on in the background for what looks like a very simply operation at the m-code level. That is likely where a lot of the time is being spent. The goal would be to reduce this data copying.

Can you build the entire Q matrix once (not piecemeal) and then go back and process those partial Q matrix vector products? If so, and you only need to access whole contiguous columns of Q, then you can avoid the data copying ordinarily associated with the sub-matrices by using this FEX submission:

https://www.mathworks.com/matlabcentral/fileexchange/65842-sharedchild-creates-a-shared-data-copy-of-contiguous-subset