# How can I vectorize my Jaro Distance code to run more efficiently?

4 views (last 30 days)
Ryan on 7 Sep 2013
Hello all!
I have written a code that will perform the Jaro distance between two strings A and B. My code however uses a lot of for loops and if statements. It looks like this:
[~,m] = size(A);
[~,n] = size(B);
sbools = ones(1,m);
tbools = ones(1,n);
sprime = 'X';
tprime = 'X';
H = floor(min(m,n)/2);
%A searching on B
for i = 1:m
for j = (i-H):(i+H)
if(j > n)
break
end
if(j>0)
if (s(1,i) == t(1, j))
if(tbools(1,j))
sprime = [sprime s(i)];
tbools(1,j) = 0;
break;
end
end
end
end
end
%B searching on A
for i = 1:n
for j = (i-H):(i+H)
if(j > m)
break
end
if(j>0)
if (t(1,i) == s(1, j))
if(sbools(1,j))
tprime = [tprime t(i)];
sbools(1,j) = 0;
break;
end
end
end
end
end
[~,mprime] = size(sprime);
[~,nprime] = size(tprime);
if((mprime > 1) && (nprime > 1))
else
out = 0;
return;
end
[~,mprime] = size(sprime);
[~,nprime] = size(tprime);
if(mprime ~= nprime)
out = 0;
return;
% error('Jaro "matching character" string length does not align');
end
runlen = min(mprime, nprime);
for i = 1:runlen
if(sprime(1,i) ~= tprime(1,i))
transp = transp + 1;
end
end
%transp = transp + abs(mprime-nprime);
transp = transp/2;
if(mprime == 0 && nprime == 0)
out = 0;
return
elseif(nprime == 0)
out = (1/3)*((mprime/m) + (nprime/n) + ((mprime - transp)/(mprime)));
else
out = (1/3)*((mprime/m) + (nprime/n) + ((nprime - transp)/(nprime)));
end
This is my complete code. The first loop searches for the letters in A that match a window of the letters in B. If there is a match it will save that letter from A and switch the boolean to false so the letter in B cannot be used twice. The same for the second part. My question: is there a better way to write this? I am only interested in a solution to the Jaro or Jaro-Winkler distance since I will be using both. A good description of the metric can be found at:
http://www.vinc.be/www-pdf/10-11_SINF2356-Mahalanobis-JaroWinckler-nDollar.pdf
and
http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
Thank you for any help in advance!
Image Analyst on 8 Sep 2013
How big are these strings and how much time are you looking to shave off? (Use tic and toc to do timing.)