getting rid of for loop in a function with long arrays

Here is a small function to calculate the neff of SM fiber. To this function I insert three long arrays: n_core, n_clad and lamda (each can be even 2^17 long array - will need it for FFT later on) If the arrays are very long, the computer get stuck and even if not it's a long processing time. Was wondering if someone knows how to make this more efficient since matlab is more efficient on matrix calculations. Basically get rid of the for loop. Note that this loop exists to solve eq. numerically something I wasn't manage to do with the "solve" function.
Thanks.
Here is the code:
function [neff,betta]=neff_fiber2(n_core,n_clad,core_diameter,lamda)
%a is the radius in um;
%lamda is the wavelenght in um;
a = core_diameter/2; %radius
k0 = 2*pi./lamda;
V = k0.*a.*sqrt(n_core.^2 - n_clad.^2);%V number
betta = zeros(1,length(V));
%%now we want to solve the following
% J0(x) / ( x*J1(x))=K0(sqrt(V^2-x^2) )/(sqrt(V^2-x^2)*K1(sqrt(V^2-x^2)))
% where x=Ka
for ii=1:length(V)
x=linspace(0,V(ii),1e6);
Diff=abs(besselj(0,x)./(x.*besselj(1,x))- ...
besselk(0,sqrt(V(ii).^2-x.^2))./(sqrt(V(ii)^2- ...
x.^2).*besselk(1,sqrt(V(ii)^2-x.^2))));
[~, ind1] = min(Diff(1,:)); % using the index of the minimum value in the row
Kappa = x(ind1)./a;
betta(ii) = sqrt(k0(ii).^2 * n_core(ii)^2 - Kappa.^2);
end
neff=betta./k0;
end

13 Comments

Start with avoiding repeated calculations of the same value. Replace:
Diff=abs(besselj(0,x)./(x.*besselj(1,x))- ...
besselk(0,sqrt(V(ii).^2-x.^2))./(sqrt(V(ii)^2- ...
x.^2).*besselk(1,sqrt(V(ii)^2-x.^2))));
by
a = sqrt(V(ii).^2 - x.^2);
Diff = abs(besselj(0, x) ./ (x .* besselj(1, x)) - ...
besselk(0, a) ./ (a .* besselk(1, a)));
(Much nicer!) But I assume that neither sqrt() is the bottleneck, nor the loop, but the Bessel functions. I do not see a way to reduce the number of calls to besselj and besselk. What about a parfor?
it's not sqrt with one variable but with something else. I can do it like Jan Simon said but still it's inside the loop.
Also, parfor won't help that much when talking about such a large number like 2^17.
@elis02: I'd expect the parallel loop to accelerate the code by a factor of the number of cores here. 2^17 is only 131'072, such that the data might match into the processor caches.
Is there any chance, that the "x = linspace(0,V(ii),1e6)" reply the same values for the different ii? Then you could reduce the number of Bessel function calls.
Would it help to calculate the values for
x = linspace(0,max(V),1e7);
and then use a linear interpolation of the output for the other V(ii)?
...
x=linspace(0,V(ii),1e6);
followed by besselj/k calls for all x is a lot of function evaluations when multiplied by the loop.
Is not the solution smooth and in the same neighborhood of the first solution as the iteration progresses?
I'd be looking at finding the neighborhood of the solution coarsely and then homing in instead of a global search starting over from scratch every time.(*)
What are typical input values and what does the functional form look like? Give us some help here other than just trying to read code...
What went wrong with fsolve or another builtin solver routine?
(*) Even a binary search to bracket solutions should help markedly rather than just brute-force solution for 1E6 points...
elis02, try using tic toc to see it the elapsed time for each iteration increases, as the iterations increase.
If this is the case, then try the following and check if you see any difference:
  • Preallocate Diff before the loop,
Diff = zeros(1,1e6).
  • In the loop, when you assign the values to Diff, use
Diff(:) = abs(besselj(0,x)...
The reason why you may see a difference is that when you use "Diff = ", then MATLAB creates a new variable for Diff in each loop (it doesn't matter it has the same name) and tries to find some new memory to store it. This results in slower and slower execution as the number of iterations increases (and eventually most probably MATLAB or your computer will freeze). If you use the "Diff(:) =" , then MATLAB will store the values in the preallocated memory block in every iteration, and every iteration will take the same time ( no new variable creation and no empty memory hunt).
I don't think this is true and a timing test showed no difference in the two assignment statements. Whole array assignment isn't the same as elementally "growing" an array or partial writes that may require a copy.
The right hand side is always calculated separately and no assignment is done until it is finished calculating. At that point memory containing the values already exists.
If you assign to the whole array then the attributes and data pointer of the symbol table entry for the array are updated to hold the (anonymous) information from the right hand side, and the reference count for the former block on the left is decremented, potentially decrementing its share count to 0, allowing it to be released back to the pool.
If you assign to the the array accessed at (:) then the results must be copied from the new object to the old object (possibly after creating a new copy of the old object if the share count was more than 1) and then the new object has its share count decremented (typically resulting it it being released back to the pool.) Either way something the size of the object gets (likely) released back to the pool, but the assignment to (:) form requires a data copy that the assignment to the whole array did not.
Therefore assignment to (:) does not save any memory and cannot be any faster than assignment to the whole object.
I tried the tic toc it really has no difference. About the repeatability, I always have different V number. (for each lamda) the solution x should be between 0 and V for each.
I tried running for one lamda the fsolve
but get many errors this is why i didn't use it. (x is a symbolic variable)
tried with the following code:
lamda=2.2; n_core =1.4997; n_clad=1.4880; core_diameter=6;
a = core_diameter/2; %radius
k0 = 2*pi./lamda;
V = k0.*a.*sqrt(n_core.^2 - n_clad.^2);%V number
syms x
fsolve(inline('besselj(0,x)./(x.*besselj(1,x))-besselk(0,sqrt(V.^2-x.^2))./(sqrt(V.^2-x.^2).*besselk(1,sqrt(V.^2-x.^2)))'),[0,V])
We recommend that you do not use inline. Use matlabFunction instead.
F = matlabFunction( besselj(0,x)./(x.*besselj(1,x))-besselk(0,sqrt(V.^2-x.^2))./(sqrt(V.^2-x.^2).*besselk(1,sqrt(V.^2-x.^2))));
fsolve(F, 1)
Note there are other solutions such as
fsolve(F, -1)
Or use
vpasolve(besselj(0,x)./(x.*besselj(1,x))-besselk(0,sqrt(V.^2-x.^2))./(sqrt(V.^2-x.^2).*besselk(1,sqrt(V.^2-x.^2))),x,[0 V])
thanks. but unfortunately it take longer than
tic
x=linspace(0,V,1e5);
b=sqrt(V.^2-x.^2);
Diff=abs(besselj(0,x)./(x.*besselj(1,x))- ...
besselk(0,b)./(b.*besselk(1,b)));
[~, ind1] = min(Diff(1,:));
Kappa = x(ind1)./a;
toc
Yet again, what are some typical input values???
Give us something that actually will run.
@dpb here are the files with the values. and using a command: neff2=neff_fiber2(n_core,n_clad,6,lamda);

Sign in to comment.

 Accepted Answer

dpb
dpb on 5 Aug 2017
Edited: dpb on 6 Aug 2017
OK, once have some data to work with it's not too bad at all...it is virtually always really, really helpful to be able to have actual problem to see; particularly with nonlinear functions.
Try the following
function [neff,betta]=neff_fiber2(n_core,n_clad,core_diameter,lamda)
%a is the radius in um;
%lamda is the wavelenght in um;
a = core_diameter/2; %radius
k0 = 2*pi./lamda;
V = k0.*a.*sqrt(n_core.^2 - n_clad.^2);%V number
betta = zeros(1,length(V));
%%now we want to solve the following
% J0(x) / ( x*J1(x))=K0(sqrt(V^2-x^2) )/(sqrt(V^2-x^2)*K1(sqrt(V^2-x^2)))
% where x=Ka
[x,y]=arrayfun(@solveFn,V);
Kappa = x/a;
betta = sqrt(k0.^2.*n_core.^2-Kappa.^2);
neff=betta./k0;
function [x,y]=solveFn(V)
x=0.8*V;
b=sqrt(V*V-x*x);
[x,y]=fzero(@nFn,x);
% solution function nFn
function y=nFn(x)
y=besselj(0,x)/(x*besselj(1,x))-besselk(0,b)/(b*besselk(1,b));
end
end
This started with investigating the functional form of the function you're trying to solve as:
Here's the two pieces that make up the Diff vector for V(1) -- note particularly the humongously large values for small values of x; there's absolutely no sense in starting from 0 or even anywhere near there so a huge fraction of your computational overhead is just totally wasted looking in the wrong part of the world...as had suggested, even something as simple as a binary search to bracket the root would have helped immensely.
Blowing this up into an area of interest and adding the difference, one can see what the functional form is in the area of interest:
This shows the functions are quite smooth and so fzero given at least a reasonable starting point should have no trouble in finding the solution.
Note that the slopes are pretty sizable in the neighborhood of the solution; hence it was observed that the solution to near machine precision differs from your approximate solution by as much as 20% or so in the x value; this is observed in your Diff vector being very jaggedy if plotted. The above solution is quite smooth, in contrast.
I just used 0.8*V for the starting guess as a first cut; it ran through the entire vector with no problems that way; you could perhaps make it a little quicker if used the previous result as the subsequent guess as it changes very little between observations; some, in fact, were the same to machine precision between successive solutions altho the error changed noticeably.
Part of debugging session; just used the first two values for brevity--
>> nf2=neff2(n_core(1:2),n_clad(1:2),6,lamda(1:2));
K>> x(1)
ans =
1.4559
K>> y(1)
ans =
1.1102e-16
K>>
The solution using the min() was
K>> x=linspace(0,V(1),1E6);
K>> b=sqrt(V(1).^2-x.^2);
K>> F1=besselj(0,x)./(x.*besselj(1,x));
K>> F2=besselk(0,b)./(b.*besselk(1,b));
K>> [mn,ix]=min(abs(F1-F2))
mn =
8.7172e-07
ix =
813897
K>> x(ix)
ans =
1.4441
K>>
which is roughly 1% of magnitude but interestingly some
K>> x(ix)-x(ix+1)
ans =
-1.7742e-06
K>> 0.0149/ans
ans =
-8.3979e+03
K>>
steps removed from the next value computed. That's the result of the slopes being sizable so the difference even on such a small scale isn't that precise.
The errors in the neighborhood; you see it was indeed the smallest computed but fzero got quite a lot more accurate solution.
K>> [F1(ix-1:ix+1)-F2(ix-1:ix+1)]
ans =
1.0e-05 *
0.4511 0.0872 -0.2768
K>>

3 Comments

Thanks!! amazing explanation. I see things I didn't know in matlab like the 'b' that you can use to define it outside the function and the method you used.
Trying to understand how you use the 'y'. I mean, why do you need to keep it in your solution (the difference)? I'm also surprised how fast it works now compared to the original!
dpb
dpb on 6 Aug 2017
Edited: dpb on 7 Aug 2017
y is saved solely in case it is ever wanted for debugging or further analysis; it isn't required or presently used; just kept because seemed quite possible it would be wanted in the future; doesn't cost anything else except a little memory which is cheap these days...
The "trick" using nested function to pass additional parameters to the minimization routines is documented here: <Parameterizing Functions>. Scope and details on nested functions is at <Nested Functions>
While I didn't check on it, you should be able to eliminate the for loop entirely with arrayfun
ADDENDUM
Well, just couldn't leave well enough alone... :)
See the updated Answer for using arrayfun() thereby eliminating the explicit loop and need for preallocation. I didn't compare run time so don't know if it makes any real difference or not.
ADDENDUM 2
I used the nested function alternative as I think it's a little easier to follow for the inexperienced user. As an "exercise for the student", your mission, should you choose to accept it, is to rewrite the solution using the anonymous function form which can reduce the source form to only the single line (I think w/o having actually done this case). :)

Sign in to comment.

More Answers (1)

x=linspace(0,V(ii),1e6);
can be replaced by
x = V(ii) * linspace(0, 1, 1e6);
That in turn leads you to the vectorized
xG = bsxfun( V(:), linspace(0, 1, 1e6) );
and then
temp = sqrt( bsxfun( @minus, V(:).^2, x.^2) );
DiffG = abs(besselj(0, xG) ./ (xG .* besselj(1,xG)) - ...
besselk(0, temp) ./ temp .* besselk(1, temp)));
[~, ind1] = min(DiffG, [], 2); %per_row minima
kappa_ind = sub2ind(size(xG), 1:size(xG,1), ind1); %each ind1 value refers to a different row
kappa = xG(kappa_ind) ./ a; %his is a vector, one for each V
betta = sqrt(k0.^2 .* n_core.^2 - Kappa.^2);
should be the vector solution.

4 Comments

can't seem to make this to work xG = bsxfun( V(:), linspace(0, 1, 1e6) );
"not enough arguments" I uploaded the files of the data and the function in previous comment
xG = bsxfun( @times, V(:), linspace(0, 1, 1e6) );
"Error using bsxfun Requested 131072x1000000 (976.6GB) array exceeds maximum array size preference. Creation of arrays greater than this limit may take a long time and cause MATLAB to become unresponsive. See array size limit or preference panel for more information."
Does it work when you tried it on the function?
Ah. In that case you will not be able to vectorize the calculation.

Sign in to comment.

Categories

Asked:

on 4 Aug 2017

Commented:

Jan
on 7 Aug 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!