How to vectorise loop to improve performance

Question

0 votes

I have a loop which i am running through 20000 times, to find which class each element of my array is closest to, and also if this closest class is correct. I am hoping i can get a large performance increase if i move this from a for loop.

    for l = 1:19999
        d1 = removedf1(l) - meansf1;
        d2 = removedf2(l) - meansf2;
        [closestVal, classFound] = min( d1 .* d1 + d2 .* d2 );
        if ~strcmp(classValues(l),classes(classFound))
            errorCount = errorCount + 1;
        end
    end

Currently the loop goes through each data element 1:19999 and subtracts this value from the mean value of the classes. This gives me d1 and d2 which are a 1 x number of classes, each holding the mean values of the classes minus the value at index l. I then perform a distance calculation to work out the smallest value and the index of this value in my 1 x number of classes array. This index will be my class value which i then compare with the actual class value at classValues(l) to see if they match.

Could anyone help me in vectoring this loop so for each pair of values in removedf1 and removedf2 i can check their distance away from each class mean in meansf2 and meansf2.

2 Comments
Show None Hide None

Cedric on 23 Oct 2013

Edited: Cedric on 23 Oct 2013

You should build a small numeric example to illustrate, with vectors of 5 to 10 values.

Andy on 23 Oct 2013

Open in MATLAB Online

Yes, here is a short example. Hopefully it is self explanatory. Each line in dataset is removed in turn, means of the classes are re calculated, each data element is then checked to see which class they are closest to based on the class means. If the assigned class is wrong then the error count increases.

    data = {{'A','B','C','A','B'}, [1,5,3,8,4], [4,7,3,2,5]};
    classes = unique(data{1});
    for i = 1:5
    disp(i)
    removedf1 = data{2}; removedf2 = data{3};
    removedf1(i) = []; removedf2(i) = [];
    classValues = data{1}; classValues(i) = [];
    for k = 1:numel(classes) 
       me = mean(data{2}(strcmp(data{1},classes(k))));
       meansf1(k) = me;
       me2 = mean(data{3}(strcmp(data{1},classes(k))));
       meansf2(k) = me2;
    end
    for l = 1:4
       d1 = removedf1(l) - meansf1;
       d2 = removedf2(l) - meansf2;
       [closestVal, classFound] = min( d1 .* d1 + d2 .* d2 );
       if ~strcmp(classValues(l),classes(classFound))
           errorCount = errorCount + 1;
       end
    end
 end

Follow Question

Answer 1

Cedric on 23 Oct 2013

Edited: Cedric on 23 Oct 2013

Open in MATLAB Online

1 vote

If I understand well, have a table of class code, and let's say x and y coordinates

 Class   'A'    'B'    'C'    'A'    'B'
     x    1      5      3      8      4
     y    4      7      3      2      5

You want to iterate through column IDs, and for each column ID cId you want to..

build copy of table without column cId,
compute class average on original table, for x and y,
for each column of the reduced table, compute distance to class average,
implement error counter if own class doesn't match closest class average.

The first things that you can work on is

the elimination of class names and strings comparison
the computation of class averages outside of the main loop.

For 1. define class IDs and work with them. If class names are simply 'A', 'B', etc, it's easy..

 >> classIDs = [data{1}{:}] - 'A' + 1
 classIDs =
     1     2     3     1     2

For 2. these means don't depend on the outer loop index, so you can compute them before the loop.

All in all, you can build a solution roughly like (you'll have to fine tune it though)..

 % - Define class IDs and unique classe IDs.
 classIDs = [data{1}{:}] - 'A' + 1 ;
 classes  = unique( classIDs ) ;
 % - Build num array of classID and data (easier to reduce).
 D = [classIDs; data{2}; data{3}] ;
 % - Compute means by class for x and y.
 x_mean = accumarray( classIDs.', data{2}, [], @mean ) ;
 y_mean = accumarray( classIDs.', data{3}, [], @mean ) ;
 % - Main loop.
 errorCount = 0 ;
 for cId = 1 : 5
    D_red = D ;
    D_red(:,cId ) = [] ;
    dx = bsxfun( @minus, D_red(:,2), x_mean ) ;
    dy = bsxfun( @minus, D_red(:,3), y_mean ) ;
    [~,closestClassIDs] = min( dx.^2 + dy.^2 ) ;
    errorCount = errorCount + sum( closestClassIDs ~= D(1,:) ) ;
 end

I don't get the same count as with your code though, but I suspect that your example was not meant to produce a correct count (?) Or there is something that I misunderstood about what you want to achieve.

3 Comments
Show 1 older comment Hide 1 older comment

Andy on 23 Oct 2013

Edited: Andy on 23 Oct 2013

Open in MATLAB Online

I have made some changes to your suggested loop. I come back with the same error count i got from my example using the code below. Am i still right in thinking that you do not re calculate the mean each iteration though?

    % - Define class IDs and unique classe IDs.
    classIDs = [data{1}{:}] - 'A' + 1 ;
    classes  = unique( classIDs ) ;
    % - Build num array of classID and data (easier to reduce).
    D = [classIDs; data{2}; data{3}] ;
    % - Compute means by class for x and y.
    x_mean = accumarray( classIDs.', data{2}, [], @mean ) ;
    y_mean = accumarray( classIDs.', data{3}, [], @mean ) ;
    % - Main loop.
    errorCount = 0 ;
    for cId = 1 : 5
    errorRound = 0;
    D_red = D ;
    D_red(:,cId ) = [] ;
      dx = bsxfun( @minus, D_red(2,:), x_mean ) ;
      dy = bsxfun( @minus, D_red(3,:), y_mean ) ;
      [~,closestClassIDs] = min( dx.^2 + dy.^2 ) ;
      errorCount = errorCount + sum( closestClassIDs ~= D_red(1,:) ) ;
    end

Cedric on 23 Oct 2013

Edited: Cedric on 23 Oct 2013

Open in MATLAB Online

In your code, the part for computing the means

    for k = 1:numel(classes) 
       me = mean(data{2}(strcmp(data{1},classes(k))));
       meansf1(k) = me;
       me2 = mean(data{3}(strcmp(data{1},classes(k))));
       meansf2(k) = me2;
    end

depends on classes, and data. But not on the reduced versions of these variables, and not on the outer loop index i. It is therefore based on the whole data and iteration-independent. So either it is correct and you can compute these means before the outer loop, or you meant to use reduced versions in that loop.

If you wanted means to be computed based on reduced versions, you would have to do the following:

 % - Define class IDs and unique classe IDs.
 classIDs = [data{1}{:}] - 'A' + 1 ;
 classes  = unique( classIDs ) ;
 % - Build num array of classID and data (easier to reduce).
 D = [classIDs; data{2}; data{3}] ;
 % - Main loop.
 errorCount = 0 ;
 for cId = 1 : 5
    D_red = D ;
    D_red(:,cId ) = [] ;
    x_mean = accumarray( D_red(1,:).', D_red(2,:), [], @mean ) ;
    y_mean = accumarray( D_red(1,:).', D_red(3,:), [], @mean ) ;
    dx = bsxfun( @minus, D_red(:,2), x_mean ) ;
    dy = bsxfun( @minus, D_red(:,3), y_mean ) ;
    [~,closestClassIDs] = min( dx.^2 + dy.^2 ) ;
    errorCount = errorCount + sum( closestClassIDs ~= D(1,:) ) ;
 end

Now I don't understand why is errorRound for in your code, and I don't know which count is the correct one between your version and mine. That only you can tell, because only you know what you want to achieve with this algorithm.

If you have a clear idea of what you want to achieve but you don't understand very well what happens in the code (yours and mine), the best that you can do is to use the debugger. For that, type

clear all

in the command window to get a clean workspace. Then place the cursor on the line

classIDs = [data{1}{:}] - 'A' + 1 ;

and press F12. You will see a read round mark on the left of the editor which indicates that you set up a break point at this location. Then press F5 (or click on the Run button) to execute the code. You will see that it will start and stop at the break point which will be indicated by a green arrow on the left of the line. Then press F10 to move forward (or F11 if you want to enter (sub-)functions). Each time, you can display the content of variables by typing their name in the command window. This will allow you to move forward in you code line by line, and understand what happens by seeing intermediary values/computations, checking sizes, results, etc.

How to vectorise loop to improve performance

2 Comments
Show None Hide None

Answers (1)

3 Comments
Show 1 older comment Hide 1 older comment

Products

Tags

Community Treasure Hunt

How to vectorise loop to improve performance

2 Comments Show None Hide None

Answers (1)

3 Comments Show 1 older comment Hide 1 older comment

Products

Tags

See Also

Community Treasure Hunt

2 Comments
Show None Hide None

3 Comments
Show 1 older comment Hide 1 older comment