# How to quickly find the column index of the last non-zero element in all rows in a sparse matrix?

38 views (last 30 days)
Benson Gou on 19 Mar 2020
Edited: Ameer Hamza on 22 Mar 2020
Hi, All,
I have a big sparse matrix A. I want to find out the column index of the Last non-zero element in all rows from the end. Here is my code:
reorderRow = [];
for jRow = 1 : length(A(:,1))
eee = find(A(jRow,:),1,'last');
reorderRow = [reorderRow eee];
end
For example, I have matrix A = [2 0 0;0 5 0;0 0 1;0 3 0;-1 0 -4]. My code gives the following result:
reorderRow = [1 2 3 2 3];
I am wondering if it is possible to obtain reorderRow without iterations. Thanks a lot.
Benson

BobH on 19 Mar 2020
Edited: BobH on 19 Mar 2020
reorderRow = arrayfun(@(X) find(A(X,:),1,'last'), 1:size(A,1))
reorderRow =
1 2 3 2 3
BobH on 19 Mar 2020

John D'Errico on 19 Mar 2020
Edited: John D'Errico on 19 Mar 2020
It is important that if your matrix is sparse, that we use sparse matrix examples to test anything out.
Regardless, I would always do a comparison of other methods to a direct find. For example...
A = sprand(10000,10000,.0005);
So, on average, 5 elements per row. If some rows have no non-zeros, then this will return zero as the index of the last element.
tic
[ir,ic] = find(A);
maxcolind = accumarray(ir,ic,[10000,1],@max);
toc
Elapsed time is 0.026327 seconds.
This will not fail, even if some rows are empty of non-zeros.
sum(maxcolind == 0)
ans =
59
The random example I created had 59 rows that were entirely zero.
If you want to find the array reorderRow as you asked for, then just do this:
r = find(maxcolind);
reorderRow = [r,maxcolind(r)];
Benson Gou on 19 Mar 2020
Dear John,
Thanks a lot. Your code took 0.002112 s for a sparse matrix 974 x 632. It works well.
Benson

James Tursa on 19 Mar 2020
Edited: James Tursa on 19 Mar 2020
You can use a mex routine for this and avoid all temporary data copies. E.g., a direct approach:
/* File: find_last_nz.c
*
* C = find_last_nz(S)
*
* Finds column number of last non-zero element in each row of sparse matrix
* S = sparse matrix (double or logical)
* C = full column vector containing the columns numbers for each row
*
* Programmer: James Tursa
*/
/* Includes ----------------------------------------------------------- */
#include "mex.h"
/* Gateway ------------------------------------------------------------ */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
double *Cpr;
mwIndex *Ir, *Jc;
size_t i, j, imax, m, n;
if( nlhs > 1 ) {
mexErrMsgTxt("Too many outputs.");
}
if( nrhs != 1 || !mxIsSparse(prhs) ) {
mexErrMsgTxt("Need exactly one sparse input.");
}
Ir = mxGetIr(prhs);
Jc = mxGetJc(prhs);
m = mxGetM(prhs);
n = mxGetN(prhs);
plhs = mxCreateDoubleMatrix(m,1,mxREAL);
Cpr = (double *) mxGetData(plhs);
for( j=0; j<n; j++ ) {
imax = Jc[j+1] - Jc[j];
for( i=0; i<imax; i++ ) {
Cpr[*Ir++] = j+1;
}
}
}
And a sample run:
>> mex find_last_nz.c
Building with 'Microsoft Windows SDK 7.1 (C)'.
MEX completed successfully.
>> A = rand(10).*(rand(10)<.2)
A =
0.6443 0 0.3111 0 0 0 0.0377 0 0 0
0 0 0 0 0 0 0 0 0 0
0.8116 0 0 0.6028 0.8010 0 0 0.4942 0 0
0.5328 0 0 0 0 0 0 0 0 0
0 0 0.9049 0.2217 0 0 0.0987 0 0 0
0.9390 0 0 0.1174 0 0 0 0.9037 0 0.1679
0 0 0 0 0 0.6791 0 0 0 0
0 0 0 0.3188 0 0.3955 0 0 0 0
0 0.2277 0 0 0 0 0.1366 0 0 0.5005
0.5870 0.4357 0 0 0 0.9880 0 0 0.5767 0.4711
>> find_last_nz(sparse(A))
ans =
7
0
8
1
7
10
6
6
10
10
What is presented above is a one-pass approach that only looks at the internal sparse indexing arrays.
It is possible that it might be faster to transpose the matrix first so that I could pick off the column numbers directly from the internal arrays, but that would require a data copy and I haven't written and tested this approach. That is, picking off the last non-zero row number for each column would be an extremely fast operation in a mex routine, but for your case the matrix would have to be transposed first which would probably dominate the run-time so I haven't bothered to write and test it.
EDIT
A timing comparison with the m-code method posted by John:
>> A = sprand(10000,10000,.001);
>>
>> tic;[ir,ic] = find(A);maxcolind = accumarray(ir,ic,[10000,1],@max);toc
Elapsed time is 0.029285 seconds.
>> tic;[ir,ic] = find(A);maxcolind = accumarray(ir,ic,[10000,1],@max);toc
Elapsed time is 0.030831 seconds.
>> tic;[ir,ic] = find(A);maxcolind = accumarray(ir,ic,[10000,1],@max);toc
Elapsed time is 0.028965 seconds.
>>
>> tic;C=find_last_nz(A);toc
Elapsed time is 0.000260 seconds.
>> tic;C=find_last_nz(A);toc
Elapsed time is 0.000253 seconds.
>> tic;C=find_last_nz(A);toc
Elapsed time is 0.000278 seconds.
>>
>> isequal(maxcolind,C)
ans =
logical
1
So, the mex routine is about 100x faster for this particular example.
Benson Gou on 22 Mar 2020
Dear Matt,
Good morning!
Thanks a lot for your great help. You have a good weekend!
Benson

Ameer Hamza on 19 Mar 2020
Edited: Ameer Hamza on 19 Mar 2020
For starters, do preallocation, It can speed up the execution if the matrix A has many rows. The following code compares the execution time for your code, BobH's code, and the code with pre-allocation. I intentionally made the A matrix with 10000 rows to show the difference
A = randi([1 5], 10000, 3);
tic
reorderRow = [];
for jRow = 1 : length(A(:,1))
eee = find(A(jRow,:),1,'last');
reorderRow = [reorderRow eee];
end
toc
% arrayfun method
tic
reorderRow = arrayfun(@(X) find(A(X,:),1,'last'), 1:length(A));
toc
% pre-allocating the array
tic
reorderRow = zeros(1, size(A,1));
for jRow = 1 : length(A(:,1))
eee = find(A(jRow,:),1,'last');
reorderRow(jRow) = eee;
end
toc
Result
Elapsed time is 0.068404 seconds. % your code (slowest)
Elapsed time is 0.048778 seconds. % array fun (medium)
Elapsed time is 0.015521 seconds. % pre-allocation (fastest)
Pre-allocation is about 3.5 times faster than the original code.
Benson Gou on 19 Mar 2020
Dear Ameer,
Thanks a lot for your great help.
Benson

Matt J on 19 Mar 2020
Edited: Matt J on 19 Mar 2020
[~,idx]=max(fliplr(logical(A)),[],2);
result=size(A,2)+1-idx;
Benson Gou on 22 Mar 2020
Best regards,
Benson