38 views (last 30 days)

Show older comments

Hi, All,

I have a big sparse matrix A. I want to find out the column index of the Last non-zero element in all rows from the end. Here is my code:

reorderRow = [];

for jRow = 1 : length(A(:,1))

eee = find(A(jRow,:),1,'last');

reorderRow = [reorderRow eee];

end

For example, I have matrix A = [2 0 0;0 5 0;0 0 1;0 3 0;-1 0 -4]. My code gives the following result:

reorderRow = [1 2 3 2 3];

I am wondering if it is possible to obtain reorderRow without iterations. Thanks a lot.

Benson

John D'Errico
on 19 Mar 2020

Edited: John D'Errico
on 19 Mar 2020

It is important that if your matrix is sparse, that we use sparse matrix examples to test anything out.

Regardless, I would always do a comparison of other methods to a direct find. For example...

A = sprand(10000,10000,.0005);

So, on average, 5 elements per row. If some rows have no non-zeros, then this will return zero as the index of the last element.

tic

[ir,ic] = find(A);

maxcolind = accumarray(ir,ic,[10000,1],@max);

toc

Elapsed time is 0.026327 seconds.

This will not fail, even if some rows are empty of non-zeros.

sum(maxcolind == 0)

ans =

59

The random example I created had 59 rows that were entirely zero.

If you want to find the array reorderRow as you asked for, then just do this:

r = find(maxcolind);

reorderRow = [r,maxcolind(r)];

James Tursa
on 19 Mar 2020

Edited: James Tursa
on 19 Mar 2020

You can use a mex routine for this and avoid all temporary data copies. E.g., a direct approach:

/* File: find_last_nz.c

*

* C = find_last_nz(S)

*

* Finds column number of last non-zero element in each row of sparse matrix

* S = sparse matrix (double or logical)

* C = full column vector containing the columns numbers for each row

*

* Programmer: James Tursa

*/

/* Includes ----------------------------------------------------------- */

#include "mex.h"

/* Gateway ------------------------------------------------------------ */

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])

{

double *Cpr;

mwIndex *Ir, *Jc;

size_t i, j, imax, m, n;

if( nlhs > 1 ) {

mexErrMsgTxt("Too many outputs.");

}

if( nrhs != 1 || !mxIsSparse(prhs[0]) ) {

mexErrMsgTxt("Need exactly one sparse input.");

}

Ir = mxGetIr(prhs[0]);

Jc = mxGetJc(prhs[0]);

m = mxGetM(prhs[0]);

n = mxGetN(prhs[0]);

plhs[0] = mxCreateDoubleMatrix(m,1,mxREAL);

Cpr = (double *) mxGetData(plhs[0]);

for( j=0; j<n; j++ ) {

imax = Jc[j+1] - Jc[j];

for( i=0; i<imax; i++ ) {

Cpr[*Ir++] = j+1;

}

}

}

And a sample run:

>> mex find_last_nz.c

Building with 'Microsoft Windows SDK 7.1 (C)'.

MEX completed successfully.

>> A = rand(10).*(rand(10)<.2)

A =

0.6443 0 0.3111 0 0 0 0.0377 0 0 0

0 0 0 0 0 0 0 0 0 0

0.8116 0 0 0.6028 0.8010 0 0 0.4942 0 0

0.5328 0 0 0 0 0 0 0 0 0

0 0 0.9049 0.2217 0 0 0.0987 0 0 0

0.9390 0 0 0.1174 0 0 0 0.9037 0 0.1679

0 0 0 0 0 0.6791 0 0 0 0

0 0 0 0.3188 0 0.3955 0 0 0 0

0 0.2277 0 0 0 0 0.1366 0 0 0.5005

0.5870 0.4357 0 0 0 0.9880 0 0 0.5767 0.4711

>> find_last_nz(sparse(A))

ans =

7

0

8

1

7

10

6

6

10

10

What is presented above is a one-pass approach that only looks at the internal sparse indexing arrays.

It is possible that it might be faster to transpose the matrix first so that I could pick off the column numbers directly from the internal arrays, but that would require a data copy and I haven't written and tested this approach. That is, picking off the last non-zero row number for each column would be an extremely fast operation in a mex routine, but for your case the matrix would have to be transposed first which would probably dominate the run-time so I haven't bothered to write and test it.

EDIT

A timing comparison with the m-code method posted by John:

>> A = sprand(10000,10000,.001);

>>

>> tic;[ir,ic] = find(A);maxcolind = accumarray(ir,ic,[10000,1],@max);toc

Elapsed time is 0.029285 seconds.

>> tic;[ir,ic] = find(A);maxcolind = accumarray(ir,ic,[10000,1],@max);toc

Elapsed time is 0.030831 seconds.

>> tic;[ir,ic] = find(A);maxcolind = accumarray(ir,ic,[10000,1],@max);toc

Elapsed time is 0.028965 seconds.

>>

>> tic;C=find_last_nz(A);toc

Elapsed time is 0.000260 seconds.

>> tic;C=find_last_nz(A);toc

Elapsed time is 0.000253 seconds.

>> tic;C=find_last_nz(A);toc

Elapsed time is 0.000278 seconds.

>>

>> isequal(maxcolind,C)

ans =

logical

1

So, the mex routine is about 100x faster for this particular example.

Ameer Hamza
on 19 Mar 2020

Edited: Ameer Hamza
on 19 Mar 2020

For starters, do preallocation, It can speed up the execution if the matrix A has many rows. The following code compares the execution time for your code, BobH's code, and the code with pre-allocation. I intentionally made the A matrix with 10000 rows to show the difference

A = randi([1 5], 10000, 3);

% your code

tic

reorderRow = [];

for jRow = 1 : length(A(:,1))

eee = find(A(jRow,:),1,'last');

reorderRow = [reorderRow eee];

end

toc

% arrayfun method

tic

reorderRow = arrayfun(@(X) find(A(X,:),1,'last'), 1:length(A));

toc

% pre-allocating the array

tic

reorderRow = zeros(1, size(A,1));

for jRow = 1 : length(A(:,1))

eee = find(A(jRow,:),1,'last');

reorderRow(jRow) = eee;

end

toc

Result

Elapsed time is 0.068404 seconds. % your code (slowest)

Elapsed time is 0.048778 seconds. % array fun (medium)

Elapsed time is 0.015521 seconds. % pre-allocation (fastest)

Pre-allocation is about 3.5 times faster than the original code.

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!