Info

# How to vectorise loop to improve performance

1 view (last 30 days)
Andy on 23 Oct 2013
Closed: MATLAB Answer Bot on 20 Aug 2021
I have a loop which i am running through 20000 times, to find which class each element of my array is closest to, and also if this closest class is correct. I am hoping i can get a large performance increase if i move this from a for loop.
for l = 1:19999
d1 = removedf1(l) - meansf1;
d2 = removedf2(l) - meansf2;
[closestVal, classFound] = min( d1 .* d1 + d2 .* d2 );
if ~strcmp(classValues(l),classes(classFound))
errorCount = errorCount + 1;
end
end
Currently the loop goes through each data element 1:19999 and subtracts this value from the mean value of the classes. This gives me d1 and d2 which are a 1 x number of classes, each holding the mean values of the classes minus the value at index l. I then perform a distance calculation to work out the smallest value and the index of this value in my 1 x number of classes array. This index will be my class value which i then compare with the actual class value at classValues(l) to see if they match.
Could anyone help me in vectoring this loop so for each pair of values in removedf1 and removedf2 i can check their distance away from each class mean in meansf2 and meansf2.
##### 2 CommentsShowHide 1 older comment
Andy on 23 Oct 2013
Yes, here is a short example. Hopefully it is self explanatory. Each line in dataset is removed in turn, means of the classes are re calculated, each data element is then checked to see which class they are closest to based on the class means. If the assigned class is wrong then the error count increases.
data = {{'A','B','C','A','B'}, [1,5,3,8,4], [4,7,3,2,5]};
classes = unique(data{1});
for i = 1:5
disp(i)
removedf1 = data{2}; removedf2 = data{3};
removedf1(i) = []; removedf2(i) = [];
classValues = data{1}; classValues(i) = [];
for k = 1:numel(classes)
me = mean(data{2}(strcmp(data{1},classes(k))));
meansf1(k) = me;
me2 = mean(data{3}(strcmp(data{1},classes(k))));
meansf2(k) = me2;
end
for l = 1:4
d1 = removedf1(l) - meansf1;
d2 = removedf2(l) - meansf2;
[closestVal, classFound] = min( d1 .* d1 + d2 .* d2 );
if ~strcmp(classValues(l),classes(classFound))
errorCount = errorCount + 1;
end
end
end

Cedric Wannaz on 23 Oct 2013
Edited: Cedric Wannaz on 23 Oct 2013
If I understand well, have a table of class code, and let's say x and y coordinates
Class 'A' 'B' 'C' 'A' 'B'
x 1 5 3 8 4
y 4 7 3 2 5
You want to iterate through column IDs, and for each column ID cId you want to..
• build copy of table without column cId,
• compute class average on original table, for x and y,
• for each column of the reduced table, compute distance to class average,
• implement error counter if own class doesn't match closest class average.
The first things that you can work on is
1. the elimination of class names and strings comparison
2. the computation of class averages outside of the main loop.
For 1. define class IDs and work with them. If class names are simply 'A', 'B', etc, it's easy..
>> classIDs = [data{1}{:}] - 'A' + 1
classIDs =
1 2 3 1 2
For 2. these means don't depend on the outer loop index, so you can compute them before the loop.
All in all, you can build a solution roughly like (you'll have to fine tune it though)..
% - Define class IDs and unique classe IDs.
classIDs = [data{1}{:}] - 'A' + 1 ;
classes = unique( classIDs ) ;
% - Build num array of classID and data (easier to reduce).
D = [classIDs; data{2}; data{3}] ;
% - Compute means by class for x and y.
x_mean = accumarray( classIDs.', data{2}, [], @mean ) ;
y_mean = accumarray( classIDs.', data{3}, [], @mean ) ;
% - Main loop.
errorCount = 0 ;
for cId = 1 : 5
D_red = D ;
D_red(:,cId ) = [] ;
dx = bsxfun( @minus, D_red(:,2), x_mean ) ;
dy = bsxfun( @minus, D_red(:,3), y_mean ) ;
[~,closestClassIDs] = min( dx.^2 + dy.^2 ) ;
errorCount = errorCount + sum( closestClassIDs ~= D(1,:) ) ;
end
I don't get the same count as with your code though, but I suspect that your example was not meant to produce a correct count (?) Or there is something that I misunderstood about what you want to achieve.
Cedric Wannaz on 23 Oct 2013
In your code, the part for computing the means
for k = 1:numel(classes)
me = mean(data{2}(strcmp(data{1},classes(k))));
meansf1(k) = me;
me2 = mean(data{3}(strcmp(data{1},classes(k))));
meansf2(k) = me2;
end
depends on classes, and data. But not on the reduced versions of these variables, and not on the outer loop index i. It is therefore based on the whole data and iteration-independent. So either it is correct and you can compute these means before the outer loop, or you meant to use reduced versions in that loop.
If you wanted means to be computed based on reduced versions, you would have to do the following:
% - Define class IDs and unique classe IDs.
classIDs = [data{1}{:}] - 'A' + 1 ;
classes = unique( classIDs ) ;
% - Build num array of classID and data (easier to reduce).
D = [classIDs; data{2}; data{3}] ;
% - Main loop.
errorCount = 0 ;
for cId = 1 : 5
D_red = D ;
D_red(:,cId ) = [] ;
x_mean = accumarray( D_red(1,:).', D_red(2,:), [], @mean ) ;
y_mean = accumarray( D_red(1,:).', D_red(3,:), [], @mean ) ;
dx = bsxfun( @minus, D_red(:,2), x_mean ) ;
dy = bsxfun( @minus, D_red(:,3), y_mean ) ;
[~,closestClassIDs] = min( dx.^2 + dy.^2 ) ;
errorCount = errorCount + sum( closestClassIDs ~= D(1,:) ) ;
end
Now I don't understand why is errorRound for in your code, and I don't know which count is the correct one between your version and mine. That only you can tell, because only you know what you want to achieve with this algorithm.
If you have a clear idea of what you want to achieve but you don't understand very well what happens in the code (yours and mine), the best that you can do is to use the debugger. For that, type
clear all
in the command window to get a clean workspace. Then place the cursor on the line
classIDs = [data{1}{:}] - 'A' + 1 ;
and press F12. You will see a read round mark on the left of the editor which indicates that you set up a break point at this location. Then press F5 (or click on the Run button) to execute the code. You will see that it will start and stop at the break point which will be indicated by a green arrow on the left of the line. Then press F10 to move forward (or F11 if you want to enter (sub-)functions). Each time, you can display the content of variables by typing their name in the command window. This will allow you to move forward in you code line by line, and understand what happens by seeing intermediary values/computations, checking sizes, results, etc.

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!