How can I vectorize my Jaro Distance code to run more efficiently?

3 views (last 30 days)
Hello all!
I have written a code that will perform the Jaro distance between two strings A and B. My code however uses a lot of for loops and if statements. It looks like this:
[~,m] = size(A);
[~,n] = size(B);
sbools = ones(1,m);
tbools = ones(1,n);
sprime = 'X';
tprime = 'X';
H = floor(min(m,n)/2);
%A searching on B
for i = 1:m
for j = (i-H):(i+H)
if(j > n)
break
end
if(j>0)
if (s(1,i) == t(1, j))
if(tbools(1,j))
sprime = [sprime s(i)];
tbools(1,j) = 0;
break;
end
end
end
end
end
%B searching on A
for i = 1:n
for j = (i-H):(i+H)
if(j > m)
break
end
if(j>0)
if (t(1,i) == s(1, j))
if(sbools(1,j))
tprime = [tprime t(i)];
sbools(1,j) = 0;
break;
end
end
end
end
end
[~,mprime] = size(sprime);
[~,nprime] = size(tprime);
if((mprime > 1) && (nprime > 1))
sprime = sprime(1,2:(mprime)); %Removes Pre-load
tprime = tprime(1,2:(nprime)); %Removes Pre-load
else
out = 0;
return;
end
[~,mprime] = size(sprime);
[~,nprime] = size(tprime);
if(mprime ~= nprime)
out = 0;
return;
% error('Jaro "matching character" string length does not align');
end
runlen = min(mprime, nprime);
for i = 1:runlen
if(sprime(1,i) ~= tprime(1,i))
transp = transp + 1;
end
end
%transp = transp + abs(mprime-nprime);
transp = transp/2;
if(mprime == 0 && nprime == 0)
out = 0;
return
elseif(nprime == 0)
out = (1/3)*((mprime/m) + (nprime/n) + ((mprime - transp)/(mprime)));
else
out = (1/3)*((mprime/m) + (nprime/n) + ((nprime - transp)/(nprime)));
end
This is my complete code. The first loop searches for the letters in A that match a window of the letters in B. If there is a match it will save that letter from A and switch the boolean to false so the letter in B cannot be used twice. The same for the second part. My question: is there a better way to write this? I am only interested in a solution to the Jaro or Jaro-Winkler distance since I will be using both. A good description of the metric can be found at:
http://www.vinc.be/www-pdf/10-11_SINF2356-Mahalanobis-JaroWinckler-nDollar.pdf
and
http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
Thank you for any help in advance!
  3 Comments
Walter Roberson
Walter Roberson on 7 Sep 2013
Is it intended to implement the original Jaro metric, or the Jaro-Winkler metric ? If it is the original Jaro metric, we are going to need a link to a formal description of the metric, as people only seem to talk about Jaro-Winkler.
Image Analyst
Image Analyst on 8 Sep 2013
How big are these strings and how much time are you looking to shave off? (Use tic and toc to do timing.)

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!