vectorisation of a loop

the matrix is "Sea" of m*n dimension .. this is the code i would like to vectorize :
=======================
S = size(Sea);
i = 1;
while i < S(1)
indices = find ( pdist2( Sea( i , : ),Sea( i+1:S(1) , :)) <0.05 );
Sea(indices , : ) = [];
S = size(Sea);
i = i+1;
end
===========
desc : i'm trying to calculate the distance between each row and all the other rows in the matrix and delete the ones that are closer than 0.05

Answers (2)

Matt J
Matt J on 30 Nov 2021
Edited: Matt J on 1 Dec 2021
[m,n]=size(Sea);
D=pdist2(Sea,Sea);
D(1:m+1:end)=inf;
indices=any(triu(D<0.05),1);
Sea(indices,:)=[];

12 Comments

thank you for answering , but i want to delete the raw from sea matrix directly
The code I've shown does that.
it gives back an array of indices not doing anything on the sea matrix
Perhaps this is what you want.
[m,n]=size(Sea);
D=pdist2(Sea,Sea);
D(1:m+1:end)=inf;
indices=any(D<0.05,1);
Sea(indices,:)=[];
it deletes all the matrix at once , thank you for your help
You should attach some data in a .mat file, so the two versions can be compared.
its simple data (matrix of 16 million * 14 singles) ,so you can generate a random mtrix m*14 of sibgles and execute the codes on it, you will see what im getting as results
Is the idea to end up with points in Sea that are all at least 0.05 apart? If so then, as you can see below, my version succeeds at this, whereas your original version does not:
Sea=rand(1000,2);
Sea1=version1(Sea);
Sea2=version2(Sea);
checkmin(Sea1) %original version
ans = 0.0016
checkmin(Sea2) %proposed version
ans = 0.0508
function val=checkmin(Sea)
D=pdist2(Sea,Sea);
val=min(D(D>0));
end
function Sea=version1(Sea)
S = size(Sea);
i = 1;
while i < S(1)
indices = find ( pdist2( Sea( i , : ),Sea( i+1:S(1) , :)) <0.05 );
Sea(indices , : ) = [];
S = size(Sea);
i = i+1;
end
end
function Sea=version2(Sea)
[m,n]=size(Sea);
D=pdist2(Sea,Sea);
D(1:m+1:end)=inf;
indices=any(triu(D<0.05),1);
Sea(indices,:)=[];
end
Zaid B
Zaid B on 2 Dec 2021
Edited: Zaid B on 2 Dec 2021
"Is the idea to end up with points in Sea that are all at least 0.05 apart?" ...the idea is somehow as you said but your version cause lost of information , i want to minimise the size of the matrix as following;
i test the first row distance to all the other rows and delete rows that are closer than 0.05 but keeping the first row
then , after deleting the other rows i test the second row from the new SEA(after deleting the rows from the previous iteration) with all the 3:end rows (always keeping the current row and deleting the closest rows)
i hope you got the idea of my code!
thank you for your continious help ^^
Matt J
Matt J on 2 Dec 2021
Edited: Matt J on 2 Dec 2021
That's also not what your current code is doing. If that's what you want, you need,
indices = i + find ( pdist2( Sea( i , : ),Sea( i+1:S(1) , :)) <0.05 ) ;
Zaid B
Zaid B on 2 Dec 2021
Edited: Zaid B on 2 Dec 2021
thank you for answering , but why ( i - 1 ) is added on that line?!
("Moreover, I don't think there is any more optimization that can be done" ... i thought so , but wanted to check if anyone has a better idea that would help)
Zaid B
Zaid B on 2 Dec 2021
Edited: Zaid B on 2 Dec 2021
i got it why is nedded there but i think "i + find(...)" not "(i-1)+find(..)" because it should keep the current i , thank you that was a great help there ^^

Sign in to comment.

Matt J
Matt J on 2 Dec 2021
Edited: Matt J on 2 Dec 2021
I find this version to be about 30% faster:
S = size(Sea,1);
i = 1;
t=0.05^2;
while i < S
indices=true(1,S);
indices(i+1:end) = sum( abs( Sea( i , : )-Sea( i+1:S , :) ).^2,2) >= t ;
Sea=Sea(indices , : );
S = size(Sea,1);
i = i+1;
end

13 Comments

Sea = rand(20000,14);
tic
P1 = version1(Sea,1); %my proposed version
checkmin(P1)
ans = 0.2857
time_V1 = toc
time_V1 = 19.5087
tic
P2 = version2(Sea,1); %your proposed version
checkmin(P2)
ans = 0.2857
time_V2 = toc
time_V2 = 23.8950
% TRUE = (P1 == P2)
function sea = version1(m,step)
S = size(m);
% SeaCopy= [];
i = 1;
while i < S(1)
% SeaCopy = [SeaCopy ; m(i,:)];
indices = i + find(pdist2(m(i,:),m(i+1:S(1),:))<0.05);
if(length(indices) >= 1)
% SeaCopy = [SeaCopy ; mean(m(indices,:))];
m(indices,:)=[];
end
S = size(m);
i = i +step;
end
sea = m;
end
function sea = version2(m,step)
S = size(m);
i = 1;
t=0.05^2;
while i < S(1)
indices=true(1,S(1));
indices(i+1:end) = sum( abs(m( i , : )-m( i+1:S(1) , :) ).^2,2) >= t ;
m=m(indices , : );
S = size(m,1);
i = i+1;
end
sea = m;
end
function val=checkmin(Sea)
D=pdist2(Sea,Sea);
val=min(D(D>0));
end
Sea = rand(8000,14);
tic
P1 = version1(Sea,1); %your proposed version
time_V1 = toc
time_V1 = 2.2475
tic
P3= version3(Sea,1); %my proposed version
time_V3 = toc
time_V3 = 1.0464
isequal(P1,P3)
ans = logical
1
function sea = version1(m,step)
S = size(m);
% SeaCopy= [];
i = 1;
while i < S(1)
% SeaCopy = [SeaCopy ; m(i,:)];
indices = i + find(pdist2(m(i,:),m(i+1:S(1),:))<0.05);
if(length(indices) >= 1)
% SeaCopy = [SeaCopy ; mean(m(indices,:))];
m(indices,:)=[];
end
S = size(m);
i = i +step;
end
sea = m;
end
function sea = version3(m,step)
S = size(m,1);
i = 1;
t=0.05^2;
while i < S
indices=false(1,S);
indices(i+1:end) = sum( abs(m( i , : )-m( i+1:S , :) ).^2,2) < t ;
m(indices , : )=[];
S = size(m,1);
i = i+1;
end
sea = m;
end
Matt J
Matt J on 4 Dec 2021
Edited: Matt J on 4 Dec 2021
Note that some of these comparisons will be Matlab release dependent. In R2018a, I find that version 2 is faster than version 1, but still slower than version 3.
version 3 is not doing the work needed ,after checking the matrix returned from version 3 , it's not equal to the same matrix returned by pervious versions (v1,v2)
Sea = rand(10000,14);
tic
P1 = version1(Sea,1); %my version always faster
ans = 1×2
9647 14
checkmin(P1)
ans = 0.5000
V1 = toc
V1 = 7.7813
tic
P2 = version2(Sea,1);% your version slighly slower but doing the job needed
checkmin(P2)
ans = 0.5000
V2 = toc
V2 = 11.0393
tic
P3 = version3(Sea,1); %verision three you proposed
checkmin(P3)
ans = 0.3358
V3 = toc
V3 = 4.1707
TRUE = isequal(P1,P3) %cheking if its the same matrix returned from previous versions
TRUE = logical
0
function sea = version1(m,step)
S = size(m);
SeaCopy= [];
i = 1;
while i < S(1)
SeaCopy = [SeaCopy ; m(i,:)];
indices = i + find(pdist2(m(i,:),m(i+1:S(1),:))<0.5);
if(length(indices) > 1)
SeaCopy = [SeaCopy ; mean(m(indices,:))];
m(indices,:)=[];
else
m(indices,:)=[];
end
S = size(m);
i = i +step;
end
size(SeaCopy)
sea = m;
end
function sea = version2(m,step)
SeaCopy= [];
S = size(m);
i = 1;
t=0.5^2;
while i < S(1)
SeaCopy = [SeaCopy ; m(i,:)];
indices=true(1,S(1));
indices(i+1:end) = sum( abs(m( i , : )-m( i+1:S(1) , :) ).^2,2) >= t ;
SeaCopy = [SeaCopy ; mean(m(indices,:))];
m = m(indices , : );
S = size(m,1);
i = i+1;
end
sea = m;
end
function sea = version3(m,step)
S = size(m,1);
i = 1;
t=0.05^2;
while i < S
indices=false(1,S);
indices(i+1:end) = sum( abs(m( i , : )-m( i+1:S , :) ).^2,2) < t ;
m(indices , : )=[];
S = size(m,1);
i = i+1;
end
sea = m;
end
function val=checkmin(Sea)
D=pdist2(Sea,Sea);
val=min(D(D>0));
end
Something must have been miscopied, because I'm getting identical results. Below is a comparison of all 3 versions, showing they are equal.
Sea = rand(8000,14);
tic
P1 = version1(Sea,1);
time_V1 = toc
time_V1 = 2.5249
tic
P2 = version2(Sea,1);
time_V2 = toc
time_V2 = 2.5527
tic
P3 = version3(Sea,1);
time_V3 = toc
time_V3 = 1.2148
isequal(P2,P1)
ans = logical
1
isequal(P3,P1)
ans = logical
1
function sea = version1(m,step)
S = size(m);
% SeaCopy= [];
i = 1;
while i < S(1)
% SeaCopy = [SeaCopy ; m(i,:)];
indices = i + find(pdist2(m(i,:),m(i+1:S(1),:))<0.05);
if(length(indices) >= 1)
% SeaCopy = [SeaCopy ; mean(m(indices,:))];
m(indices,:)=[];
end
S = size(m);
i = i +step;
end
sea = m;
end
function sea = version2(m,step)
S = size(m);
i = 1;
t=0.05^2;
while i < S(1)
indices=true(1,S(1));
indices(i+1:end) = sum( abs(m( i , : )-m( i+1:S(1) , :) ).^2,2) >= t ;
m=m(indices , : );
S = size(m,1);
i = i+1;
end
sea = m;
end
function sea = version3(m,step)
S = size(m,1);
i = 1;
t=0.05^2;
while i < S
indices=false(1,S);
indices(i+1:end) = sum( abs(m( i , : )-m( i+1:S , :) ).^2,2) < t ;
m(indices , : )=[];
S = size(m,1);
i = i+1;
end
sea = m;
end
run the checkmin function ... since its rand generated matrix they might be no changes in matrix so they return the same input matrix .... if you run the check min version you should get in v1 & 2 ans = 0.0500 and the third one something else
to see what i mean you can print the size of each output returned by versions
There is no way the results can be different, either in size or data content, if the isequal() command says they are not. Just to confirm this for you, though, here are my results with checkmin:
Sea = rand(10000,14);
tic
P1 = version1(Sea,1); %my proposed version
time_V1 = toc
time_V1 = 3.4304
tic
P2 = version2(Sea,1); %your proposed version
time_V2 = toc
time_V2 = 4.1817
tic
P3 = version3(Sea,1); %your proposed version
time_V3 = toc
time_V3 = 2.2365
isequal(P1,P2)
ans = logical
1
isequal(P1,P3)
ans = logical
1
checkmin(P1)
ans = 0.3305
checkmin(P2)
ans = 0.3305
checkmin(P3)
ans = 0.3305
yeah that means that the genrated samples are >0.05 since the begining .. run the sizes for each version it will be more clear for you
Fine, but we can do the same comparison with less trivial input data:
Sea = rand(2e4,14)/10;
tic
P1 = version1(Sea,1);
time_V1 = toc
time_V1 = 9.2412
tic
P2 = version2(Sea,1);
time_V2 = toc
time_V2 = 13.1732
tic
P3 = version3(Sea,1);
time_V3 = toc
time_V3 = 6.8440
isequal(P1,P2)
ans = logical
1
isequal(P2,P3)
ans = logical
1
size(P1)
ans = 1×2
18604 14
size(P2)
ans = 1×2
18604 14
size(P3)
ans = 1×2
18604 14
can we parallel it if possible???
that might help speed up the calculations !
Matt J
Matt J on 6 Dec 2021
Edited: Matt J on 6 Dec 2021
If you have the Parallel Computing Toolbox, then making Sea a gpuArray should add some speed.
yeah i do , i ll try it and post the results later

Sign in to comment.

Categories

Find more on Graphics Performance in Help Center and File Exchange

Asked:

on 30 Nov 2021

Edited:

on 6 Dec 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!