Finding the max number of non-zero repeating elements for each row in a large matrix
5 views (last 30 days)
Show older comments
I have a (large) matrix of type 'uint8' or 'uint16', for (a small) example
A = [0 0 2 2 3;
0 1 3 3 3;
0 0 0 0 0;
1 2 2 2 3];
One can make the following assumptions on A:
- size(A,2) is relatively small, e.g.
- size(A,1) is big, i.e. a few hundred thousands of rows, possibly even a few millions;
- the rows of A are sorted in an ascending manner;
- the elements of A are bounded above by a constant M that is sufficiently smaller than the maximum the type allows, e.g. for 'uint8' and for 'uint16'.
I would like to compute a vector
b = [2;
3;
0;
3];
assigning to each row of A the max number of repetitions of the non-zero elements contained in that row. Now, the way I do this currently is the following vectorized code running on my GPU:
a = permute(unique(A(A~=0)),[3 2 1]); % determine the unique non-zero values of A;
b = max(sum(uint8(bsxfun(@eq,A,a)),2,'native'),[],3); % compare A with a and count each value, then take max.
But if the matrix A is large and has a lot of variance (hence so does a), there appears to be a lot of unnecessary computation. So I was wondering if there is a faster way to do this since I have to do it for a lot of such large matrices.
0 Comments
Accepted Answer
Matt J
on 28 Jul 2022
Edited: Matt J
on 28 Jul 2022
size(A,2) is relatively small,
That being the case, I'd just use a loop:
A = [0 0 2 2 3;
0 1 3 3 3;
0 0 0 0 0;
1 2 2 2 3];
[m,n]=size(A);
count=zeros(m,1);
b=-inf(m,1);
Alast=A(:,1);
for i=1:size(A,2)
Ai=A(:,i);
increm=(Ai==Alast & Alast~=0);
count(increm)=count(increm)+1;
reset=~increm;
count(reset)=(Ai(reset)~=0);
b=max(b,count);
Alast=Ai;
end
b
More Answers (2)
Bruno Luong
on 29 Jul 2022
Edited: Bruno Luong
on 29 Jul 2022
This code would not requires too much RAM beside a copy of A transposed.
It somesort of vectorized runlength encoding. A little bit cryptic code I must admit.
The for-loop posted by Matt is the best IMO
A = [0 0 2 2 3;
0 1 3 3 3;
0 0 0 0 0;
1 2 2 2 3];
[m,n] = size(A);
B = [A.'; Inf(1,m)];
[c,r] = find([ones(1,m); diff(B)]);
Aij = B(c+(r-1)*(n+1));
runlength = diff(c).*(diff(r)==0);
runvalue = Aij(1:end-1);
lgtnz = runlength.*(runvalue>0); % does count the length of sequence of 0s
maxlgt = accumarray(r(1:end-1), lgtnz, [m,1], @max)
0 Comments
Matt J
on 28 Jul 2022
Edited: Matt J
on 28 Jul 2022
This should be fine for the uint8 case. For the uint16 case, it might strain RAM.
A = [0 0 2 2 3;
0 1 3 3 3;
0 0 0 0 0;
1 2 2 2 3];
N=size(A,1);
M=max(A(:));
[I,J,S]=find(A);
b=max( accumarray([I,S],1,[N,M]) ,[],2)
2 Comments
Bruno Luong
on 29 Jul 2022
Edited: Bruno Luong
on 29 Jul 2022
I think because
[I,S]
is converted to the lower class of I and S, which is uint8/uint16. Thus I is truncated to intmax of the class.
you can fix by replace with
[I,double(S)]
See Also
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!