Speeding up nested for-loops when vectorization seems to fail
Show older comments
Dear All, I have the following problem.
There are relatively simple operations of barycentric interpolation in a large, multi-dimensional array, which leads to long computing times. The code is given as follows:
clear all;
clc;
% auxilliary matrices and vectors
na = 100;
nm = 100;
nL = 10;
a = 1:na;
m = 1:nm;
[ap_grid, mp_grid] = ndgrid(a,m);
[a_grid, m_grid] = ndgrid(a,m);
A = randn(100,na,nm,2,2,3);
M = randn(100,na,nm,2,2,3);
AP_int1 = zeros(100,na,nm,2,2,3,nL);
MP_int1 = zeros(100,na,nm,2,2,3,nL);
AP_int2 = zeros(100,na,nm,2,2,3,nL);
MP_int2 = zeros(100,na,nm,2,2,3,nL);
a_gridVEC = repmat( permute( a_grid, [3 1 2 4 5 6] ), 100,1,1,2,2,3);
m_gridVEC = repmat( permute( m_grid, [3 1 2 4 5 6] ), 100,1,1,2,2,3);
% looped computations
tic
for q1 = 1:100
for q2 = 1:2
for q3 = 1:2
for q4 = 1:3
for r1 = 1:nL % a time consuming loop
ap1 = ap_grid(1,1); % in the real situation points ap and mp change every iteration
ap2 = ap_grid(1,2);
ap3 = ap_grid(2,1);
mp1 = mp_grid(1,1);
mp2 = mp_grid(1,2);
mp3 = mp_grid(2,1);
a1 = A(q1,1,1,q2,q3,q4); % in the real situation the 2nd and 3rd indices of A and M change every iteration
a2 = A(q1,1,2,q2,q3,q4);
a3 = A(q1,2,1,q2,q3,q4);
m1 = M(q1,1,1,q2,q3,q4);
m2 = M(q1,1,2,q2,q3,q4);
m3 = M(q1,2,1,q2,q3,q4);
for r2 = 1:na
for r3 = 1:nm
a_int1 = a_grid(r2,r3);
m_int1 = m_grid(r2,r3);
% barycentric interpolation (weights)
w1 = ( (m2-m3)*(a_int1-a3) + (a3-a2)*(m_int1-m3) ) / ( (m2-m3)*(a1-a3) + (a3-a2)*(m1-m3) );
w2 = ( (m3-m1)*(a_int1-a3) + (a1-a3)*(m_int1-m3) ) / ( (m2-m3)*(a1-a3) + (a3-a2)*(m1-m3) );
w3 = 1 - w1 - w2;
% barycentric interpolation (results)
if w1 < -0.25 || w2 < -0.25 || w3 < -0.25 % we allow for only 'a bit' of extrapolation
AP_int2(q1,r2,r3,q2,q3,q4,r1) = NaN;
MP_int2(q1,r2,r3,q2,q3,q4,r1) = NaN;
else
AP_int2(q1,r2,r3,q2,q3,q4,r1) = w1 * ap1 + w2 * ap2 + w3 * ap3;
MP_int2(q1,r2,r3,q2,q3,q4,r1) = w1 * mp1 + w2 * mp2 + w3 * mp3;
end
end % loop with nm iterations
end % loop with na iterations
end % a time consuming loop
end
end
end
end
toc
% vectorized computations
tic
for r1 = 1:nL % a time consuming loop (in the real situation points ap,mp,a and m change every iteration of this loop)
ap1 = repmat( ap_grid(1,1), 100,na,nm,2,2,3);
ap2 = repmat( ap_grid(1,2), 100,na,nm,2,2,3);
ap3 = repmat( ap_grid(2,1), 100,na,nm,2,2,3);
mp1 = repmat( mp_grid(1,1), 100,na,nm,2,2,3);
mp2 = repmat( mp_grid(1,2), 100,na,nm,2,2,3);
mp3 = repmat( mp_grid(2,1), 100,na,nm,2,2,3);
a1 = repmat( A(:,1,1,:,:,:), 1,na,nm,1,1,1);
a2 = repmat( A(:,1,2,:,:,:), 1,na,nm,1,1,1);
a3 = repmat( A(:,2,1,:,:,:), 1,na,nm,1,1,1);
m1 = repmat( M(:,1,1,:,:,:), 1,na,nm,1,1,1);
m2 = repmat( M(:,1,2,:,:,:), 1,na,nm,1,1,1);
m3 = repmat( M(:,2,1,:,:,:), 1,na,nm,1,1,1);
% barycentric interpolations (weights)
w1 = ( (m2-m3).*(a_gridVEC-a3) + (a3-a2).*(m_gridVEC-m3) ) ./ ( (m2-m3).*(a1-a3) + (a3-a2).*(m1-m3) );
w2 = ( (m3-m1).*(a_gridVEC-a3) + (a1-a3).*(m_gridVEC-m3) ) ./ ( (m2-m3).*(a1-a3) + (a3-a2).*(m1-m3) );
w3 = 1 - w1 - w2;
w1(w1<-0.25) = NaN; % we allow for only 'a bit' of extrapolation
w2(w2<-0.25) = NaN;
w3(w3<-0.25) = NaN;
% barycentric interpolations (results)
AP_int1(:,:,:,:,:,:,r1) = w1 .* ap1 + w2 .* ap2 + w3 .* ap3;
MP_int1(:,:,:,:,:,:,r1) = w1 .* mp1 + w2 .* mp2 + w3 .* mp3;
end
toc
Unfortunately, vectorization of these looped operations makes computational times only worse, contradicting the common Matlab knowledge that vectorization always helps. Why is it so? Is there any other way of speeding up the code? The parfor loop also slows down the code. Would it help to write this piece of code in C++ and call it from Matlab?
3 Comments
darova
on 14 Feb 2020
The code is too complicated
"...contradicting the common Matlab knowledge that vectorization always helps."
It doesn't.
"Why is it so?"
Because that "common Matlab knowledge" is incorrect.
Vectorized code is often much faster than using loops, because the underlying operations can be performed very efficiently using highly optimized code (usually written in C). But not always faster, as you write. That is simply incorrect. Particularly in cases where the vectorized code generates large intermediate arrays, vectorized code can easily be much slower than looped code. The real "common Matlab knowledge" requires the user to consider memory consumption, array allocation, etc. and if it really is critical, measure using the profiler, etc.
Whoever told you that "vectorization always helps" does not understand MATLAB.
Subhadeep Koley
on 14 Feb 2020
Accepted Answer
More Answers (1)
I tend to think you should be using scatteredInterpolant rather than implementing your own interpolation routine with loops.
1 Comment
Adam Pigon
on 19 Feb 2020
Categories
Find more on Logical in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!