Speeding up (vectorizing) barycentric interpolation

Dear All, I have the following problem:
is it possible to speed up the barycentric interpolation given below? I presume that vectorization could be particularly effective but I do not know how to do it. As the code has to be run many times (and the size of matrices involved is rather large) any increase in efficiency would be helpful. Could GPU computing, after vectorization, be of any help here?
na = 100;
nh = 100;
nm = 100;
nap = na+2;
nhp = nh+2;
H = randn(nap,nhp) ;
M = randn(nap,nhp) ;
MP = randn(nap,nhp) ;
ap_trial = NaN(nh,nm) ;
hp_trial = NaN(nh,nm) ;
mp_trial = NaN(nh,nm) ;
[h_grid,m_grid] = ndgrid(1:nh,1:nm);
[ap_grid,hp_grid] = ndgrid(1:nap,1:nhp);
tic
for r0 = 0:1
for r1 = 0:(nap-2)
for r2 = 0:(nhp-2) % loops picking up triangles constructed by adjacent points in matrices H and M
h1 = H(1+r1+r0,1+r2+r0);
h2 = H(1+r1,2+r2);
h3 = H(2+r1,1+r2);
m1 = M(1+r1+r0,1+r2+r0);
m2 = M(1+r1,2+r2);
m3 = M(2+r1,1+r2);
% barycentric interpolation (weights)
w1 = ( (m2-m3)*( h_grid -h3) + (h3-h2)*( m_grid -m3) ) / ( (m2-m3)*(h1-h3) + (h3-h2)*(m1-m3) );
w2 = ( (m3-m1)*( h_grid -h3) + (h1-h3)*( m_grid -m3) ) / ( (m2-m3)*(h1-h3) + (h3-h2)*(m1-m3) );
w3 = 1 - w1 - w2;
% preventing extrapolation
w1(w1 < 0) = NaN;
w2(w2 < 0) = NaN;
w3(w3 < 0) = NaN;
% barycentric interpolation (results)
ap = w1 * ap_grid(1+r1+r0,1+r2+r0) + w2 * ap_grid(1+r1,2+r2) + w3 * ap_grid(2+r1,1+r2);
hp = w1 * hp_grid(1+r1+r0,1+r2+r0) + w2 * hp_grid(1+r1,2+r2) + w3 * hp_grid(2+r1,1+r2);
mp = w1 * MP(1+r1+r0,1+r2+r0) + w2 * MP(1+r1,2+r2) + w3 * MP(2+r1,1+r2);
ap_trial( isnan(ap) == 0 ) = ap( isnan(ap) == 0 );
hp_trial( isnan(hp) == 0 ) = hp( isnan(hp) == 0 );
mp_trial( isnan(mp) == 0 ) = mp( isnan(mp) == 0 );
end
end
end
toc

14 Comments

Of course! Driven by Techniques to Improve Performance and my own experience I am inclined to think that vectorization may speed up the code. The question is how to perform the vectorization as I do not see any possibilities to implement other suggested methods of improving efficiency.
The tic toc made me ask.
I cannot see how to vectorize any part of your code. (But other might.)
It helps a tiny bit to replace isnan(ap) == 0 by ~isnan(ap), etc
These six red lines takes more than 80% of the time (on my system).
I added imagesc inside first for loop
Only a few elements is re-calculated all the time. Can you attach original data?
The goal of the for-loops is to pick up all adjacent triangles from a two-dimensional matrix. For r0 = r1 = r2 = 0 we have the triangle (1,1), (1,2) and (2,1), which is the traingle that is shown on the picture. When r0 = 1 we have the triangle (2,2), (1,2) and (2,1). Loops on r1 and r2 move to other squares from which r0 takes both triangles (i.e. the upper left one and the lower right ones).
Do I make a mistake in my reasoning? Original data will not help much I guess?
On the animation matrix is hp_trial. I mean if you know that only a few elements (positions) re-calculated all the time you don't need these operations. Every cycle you are searching for NaN values (runs through entire matrix)
ap_trial( isnan(ap) == 0 ) = ap( isnan(ap) == 0 );
hp_trial( isnan(hp) == 0 ) = hp( isnan(hp) == 0 );
mp_trial( isnan(mp) == 0 ) = mp( isnan(mp) == 0 );
Maybe can be replaced with
ap1 = ap(1:5,1:5);
aptr1 = ap_trial(1:5,1:5);
aptr1( ~isnan(ap1) ) = ap1( ~isnan(ap1) ); % search only in up left corner
ap_trial(1:5,1:5) = aptr1;
  • Original data will not help much I guess?
We will find out
I see your point now. Unforunately, the example above using randn matrices may be a bit misleading as randn numbers are concentrated around 0. Therefore, we are operating with just a couple of indices like 1:5.
In reality:
Matrices ap_grid, hp_grid, h_grid and m_grid are exogenously given and they are regular, strictly monotonic, although not necessarily equidistant, grids.
Matrices H and M are endogenous - they are a function of many arguments like ap_grid and hp_grid matrices, some parameters, time iterations etc. and it is hard to say what their limits are. They are not regular grids. If they were, interpolation would be trivial (scatteredInterpolation is not an option though).
The goal of the entire code is to do the following:
  1. Take a given triangle out of exogenous ap_grid and hp_grid matrices.
  2. Take a corresponding triangle out of endogenous matrices H and M.
  3. Ascertain which points of exogenous matrices h_grid and m_gris are inside of the corresponding H and M triangle.
  4. Write down interpolated values of ap and hp for all points on the h_grid and m_grid (that's why we need overwriting of the ap_trial matrix).
We cannot say ex ante which points of the h_grid and m_grid are inside of a given H and M triangle. Maybe all of them are, maybe none. Practically, there are 1 or 2 or, most often, none.
I thought it could be an ample source of optimization. The question is how to effectively find only these parts of matrices that have non-NaN values and operate just with them? Would it be even possible to achieve a speed gain then?
In short, your idea is briliant but instead of indices 1:5 there should be endogenous indices x:y that change every loop iteration.
I tried to scale rand numbers by 15
Is this correct?
Yes, scaling the randn function is a better approximation of reality. Nevertheless, I do not understand why there are two triangles on you graph. The matrix is a regular grid composed of the h_grid and m_grid while the triangle vertices are given by H and M points. Then, we take whats inside of the triangle.
Ok i understood. What about these? Those a values inside triangle too?
w1(w1 < 0) = NaN;
w2(w2 < 0) = NaN;
w3(w3 < 0) = NaN;
If a weight of a given vertex is negative, then it is outside of the triangle. By setting negatve weights to NaN we prevent extrapolation.
Then you can use my brilliant idea again and search for NaN in triangle region
Thank you for your idea! I have tested this improvement and my conclusions are as follows:
Matlab is pretty slow in indexing matrices (see this thread for interesting discussion: https://www.mathworks.com/matlabcentral/answers/54522-why-is-indexing-vectors-matrices-in-matlab-very-inefficient ), which means that my code needed some streamlining to achieve better performance. Including additional indices, i.e. of these points that are within the triangle, that have to be computed and implemented is costly. Therefore, dimensions nh and nm have to be large enough to achieve a good trade off beteween an additional cost of indexing and an efficiency gain from operating with smaller matrices that exploit only a fraction of matrices of size (nh x nm). In a nuthshell, if nm and nh are large enough, the efficiency gain can be sizable. An increase in nm is actually costless as nm does not show up in any of the loop sizes and therefore it doesn't put any additional indexing burden.
Thanks!
Thanks for the link

Sign in to comment.

Answers (0)

Products

Release

R2018a

Asked:

on 4 Apr 2020

Commented:

on 12 Apr 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!