2 views (last 30 days)

Show older comments

My Matlab code has multiple double/triple for loops which I believe cause a bottleneck when the index summation variables reach large values. I've tried to implement vectorization instead of double/triple for loops with some small performance gains. I'm interested in understanding best practices for changing double/triple for loops into vectorized Matlab code.

My real code includes 10-15 functions that implement variations of double/triple for loops. I'm curious on how I could improve or rewrite these for loops as vectorized code. I've included two examples below of a double for loop and a triple for loop. I'm interested in seeing if these could be improved. I plan to rewrite all of the for loops as vectorized code (if possible).

First example: This is a double for loop. Implementating the builtin vectorization in Matlab appears to increase the computational speed. I'm not sure if this could be done more efficiently.

close all;

clear all;

clc;

% Test 1: Small M,N

M = 50;

N = 50;

c = rand(N,M)+i*rand(N,M);

eps = .3;

delt = .4;

tStart = tic;

polyv = 0;

for n = 1:N

for m = 1:M

polyv = polyv + c(n,m)*eps^n*delt^m;

end

end

tEnd = toc(tStart);

fprintf('Testing against a double for loop \n');

fprintf('First test with small N,M\n');

fprintf('Total time by double for loop is %f seconds \n', tEnd);

% This may be a sloppy way of vectorizing the code. I'm not sure if there

% is a more efficient method

tStart = tic;

[nval,mval] = ndgrid(1:N,1:M);

vander = eps.^nval.*delt.^mval;

polyv1 = sum(sum(c.*vander));

tEnd = toc(tStart);

fprintf('Total time by vectorization is %f seconds \n', tEnd);

diff = norm(polyv1 - polyv,inf); % should be tiny

% Test 2: "Large" M,N

M = 1000;

N = 1000;

c = rand(N,M)+i*rand(N,M);

eps = .3;

delt = .4;

tStart = tic;

polyv = 0;

for n = 1:N

for m = 1:M

polyv = polyv + c(n,m)*eps^n*delt^m;

end

end

tEnd = toc(tStart);

fprintf('Second test with large N,M\n');

fprintf('Total time by double for loop is %f seconds \n', tEnd);

tStart = tic;

[nval,mval] = ndgrid(1:N,1:M);

vander = eps.^nval.*delt.^mval;

polyv1 = sum(sum(c.*vander));

tEnd = toc(tStart);

fprintf('Total time by vectorization is %f seconds \n', tEnd);

diff2 = norm(polyv1 - polyv,inf); % should be tiny

These two test return

Testing against a double for loop

First test with small N,M

Total time by double for loop is 0.008831 seconds

Total time by vectorization is 0.007447 seconds

Second test with large N,M

Total time by double for loop is 0.393363 seconds

Total time by vectorization is 0.163277 seconds

So there seems to be a small performance gain with the second method.

Second example: This is a triple for loop. I have created test data (my actual Matlab code code is far more complicated but I'm really only interested in improving performance).

close all;

clear all;

clc;

% Test data

N = 20;

M = 20;

N_Eps = 100;

N_delta = 100;

Nx = 32;

ru = zeros(N_Eps,N_delta);

rl = zeros(N_Eps,N_delta);

alpha = 0;

B = zeros(Nx,1);

C = zeros(Nx,1);

alpha_p = ones(Nx,1);

k_u = 2;

k_w = 1;

tStart = tic;

for ell=1:N_delta

for j=1:N_Eps

for p=1:N

% Real code is similar - this is just test data

if(alpha_p(p)^2 < k_u^2)

B(p) = 1;

end

if(alpha_p(p)^2 < k_w^2)

C(p) = 0;

end

end

ru(j,ell) = sum(B);

rl(j,ell) = sum(C);

end

end

tEnd = toc(tStart);

fprintf('Total time by triple for loop is %f seconds \n', tEnd);

% This has more meaning in my real Matlab code. I'm only inserting in test

% data to improve performance.

ee = 1.0 - ru - rl;

This returns

Total time by triple for loop is 0.013758 seconds

I'm not sure how to vectorize this code through a method similar to the first example.

Do you recommend (in general) vectorization over writing double/triple for loops? Should you always try to use vectorization and avoid writing any double/triple for loops? I'm curious what more experienced Matlab users do.

Jan
on 13 Jul 2021

Edited: Jan
on 13 Jul 2021

The power operation is very expensive, so avoid it whenever it is possible:

M = 50;

N = 50;

c = rand(N,M) + i*rand(N,M);

epsilon = 0.3; % Do not shadow the importan function "eps"

delt = 0.4;

tic;

for rep = 1:1000

polyv = 0;

for n = 1:N

for m = 1:M

polyv = polyv + c(n,m) * epsilon^n * delt^m;

end

end

end

toc

tic

for rep = 1:1000

polyv = 0;

c1 = 1;

for n = 1:N

c1 = c1 * epsilon;

c2 = 1;

for m = 1:M

c2 = c2 * delt;

polyv = polyv + c(n,m) * c1 * c2;

end

end

end

toc

3 times faster already.

tic

for rep = 1:1000

[nval,mval] = ndgrid(1:N,1:M);

vander = epsilon.^nval .* delt.^mval;

polyv1 = sum(sum(c .* vander));

end

toc

It has some severe drawbacks: It contains a lot of power operations, but they are called for the same inputs repeatedly. Avoid ndgrid, but calculate the powers of the vectors and create the matrix afterwards:

for rep = 1:1000

vander = (epsilon .^ (1:N).') .* (delt .^ (1:M));

polyv1 = sum(sum(c .* vander));

end

toc

But you can avoid the power operation by using cumprod():

tic

for rep = 1:1000

c1 = cumprod(repmat(epsilon, N, 1));

c2 = cumprod(repmat(delt, 1, M));

polyv = sum(c .* c1 .* c2, 'all');

end

toc

Please run this locally again to check the timings.

In your 2nd example, the repeated squaring of alpha_p, k_u and k_u is a waste of time. Do this once before the loops.

For real values, a^2 < b^2 is equivalent to abs(a) < abs(b).

A vectorized version of:

% Original:

for p=1:N

if(alpha_p(p)^2 < k_u^2)

B(p) = 1;

end

if(alpha_p(p)^2 < k_w^2)

C(p) = 0;

end

end

% Vectorized, alpha_p_2 = alpha_p .^ 2 etc. defined before the loops

B(alpha_p_2 < k_u_2) = 1;

C(alpha_p_2 < k_w_2) = 0;

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!