Different behaviour of lsqnonlin in Version 2020b

Hi Matlab community!
I've set up a Jenkins server for our simulation code. We have a set of test cases, where output data of our code is compared to reference values from the literature or stable versions of the code. Matlab on jenkins server is Version 2020b.
One of the test cases does not find a physically reasonable starting point with lsqnonlin for further calculations and therefore crashes.
Here are some more information:
  • other test cases, which also utilize lsqnonlin, are running smoothly on the jenkins server
  • the test simulation is running as expected with Matlab 2019b, 2020a and even 2020b on our local machines.
  • the input variables for lsqnonlin is exactly the same as in the versions on our local machines mentioned above.
Here is the output of lsqnonlin with options.Display = 'iter-detailed' on the jenkins server (2020b):
Norm of First-order
Iteration Func-count f(x) step optimality
0 4 2654.41 2.53e+08
1 8 0.011845 0.0155154 9.14e+03
2 12 0.011845 0.303476 9.14e+03
3 16 0.00806019 0.075869 6.07e+04
4 20 0.00368235 0.151738 1.7e+05
5 24 0.00103906 0.1392 4.17e+03
6 28 0.000473188 0.0709188 1.44e+03
7 32 0.000473149 1.20827e-11 0.00672
8 36 0.000473149 1.14514e-16 0.00672
Optimization stopped because the norm of the current step, 1.145139e-16,
is less than options.StepTolerance = 1.000000e-14.
and here for my local machine (2019b):
Norm of First-order
Iteration Func-count f(x) step optimality
0 4 2654.41 2.53e+08
1 8 0.0118525 0.0151844 8.74e+03
2 12 0.0118525 0.303542 8.74e+03
3 16 0.00806673 0.0758855 6.08e+04
4 20 0.00368596 0.151771 1.7e+05
5 24 0.000328772 0.278554 4.66e+04
6 28 0.000328772 0.0406314 4.66e+04
7 32 0.000162278 0.0101578 1.11e+04
8 36 3.50764e-05 0.0203157 3.27e+04
9 40 6.80252e-07 0.00742853 5.4e+03
10 44 4.57863e-11 0.00030108 15.2
11 48 1.0185e-13 1.32724e-05 0.0151
12 52 6.69753e-18 1.04583e-06 0.00509
13 56 2.66823e-18 8.94317e-09 0.0108
14 60 2.24934e-20 2.59502e-10 0.000996
15 64 2.24918e-20 1.17172e-15 0.000996
Optimization stopped because the norm of the current step, 1.171717e-15,
is less than options.StepTolerance = 1.000000e-14.
The first steps seem to be similiar, but after step 6 the jenkins server seems to find no difference in f(x). As mentioned above, all input variables are exactly the same and its even running with 2020b on our local machines.
Additionally here is a list of all installed Tool Boxes on the jenkins server. Maybe there is a dependency of another Tool box, which I am not aware of.
-----------------------------------------------------------------------------------------------------
MATLAB Version: 9.9.0.1467703 (R2020b)
MATLAB License Number: xxxxxx
Operating System: Linux 4.4.0-176-generic #206-Ubuntu SMP Fri Feb 28 05:02:04 UTC 2020 x86_64
Java Version: Java 1.8.0_202-b08 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode
-----------------------------------------------------------------------------------------------------
MATLAB Version 9.9 (R2020b)
Curve Fitting Toolbox Version 3.5.12 (R2020b)
Global Optimization Toolbox Version 4.4 (R2020b)
Optimization Toolbox Version 9.0 (R2020b)
Parallel Computing Toolbox Version 7.3 (R2020b)
Partial Differential Equation Toolbox Version 3.5 (R2020b)
Wavelet Toolbox Version 5.5 (R2020b)
With kind regards,
Dominik

8 Comments

I edited the question to remove the license number.
Thank you very much! I've just realized it myself!
What are the processors of your server and local machine?
We need to know more about the objective function. Certain objectives can be numerically sensitive enough to produce different results on different platforms and Matlab versions.
Hi all,
the problem has been "resolved". The test case works for every other set of input parameters we tested. As you suggest, I'd guess that we hit exactly the input parameter set, where the case is numerically sensitive enough to produce different results for different machines/Matlab Versions. It is still baffling me. We'll keep that in mind for future test cases.
Best regards,
Dominik
Dominik, we ran into a similar enough event that you might benefit from knowing about it.
We got randomly different optimization-results on the same machine from the same starting-point due to a bug that was caused by multithreading of the sum-function. I had written a code-section like this:
S = sum(V(:));
if D % not true in these runs
Do_stuff_w_V
end
if sum(V(:)) ~= S
Normalise_V
end
Due to the multithreading of sum the sub-sums could be added in different order making the comparison for not-equal to become true, which lead the optimization to fall into two different regions of convergence...
IIRC the sum bug (among other things due to multi-threading ) appeared 5/6 year back and has been fixed since, no ?
@Bruno, I think both you and I are getting old? I think it was closer to 8-10 years ago (from where I worked when...). But you're right in that that feature has been corrected. I just thought that it might be a feature than could pop up if parts of functions use parallel-processing or the like.

Sign in to comment.

 Accepted Answer

Hi Dominik,
I understand that you are running the same code on the same version of MATLAB 2020b on two architectures, Jenkins and a local machine, and getting different answers. Unfortunately, the Math Kernel Library (MKL) that ships with MATLAB is architecture-dependent to optimize the architectures' resources when possible (multi-threading, etc.). This is good for performance but can result in the phenoma you describe above. Therefore, two different machines running the same version of MATLAB can call their respective MKLs and get answers with minor differences.
As most optimization algorithms are inherently iterative, these minor differences can propogate and become major differences over many iterations. This is why your results match for the first few iterations before they diverge in later iterations. If an optimization problem has multiple minima, these differences can sometimes cause optimization algorithms to converge to distinct minimizers resulting in substantially different valid answers.

1 Comment

Hi Caleb,
thanks for your response! We have come to the same conclusion. We altered the problematic test case (increased one parameter slightly) and the test case is running smoothly on all machines.
Best regards and stay healthy,
Dominik

Sign in to comment.

More Answers (0)

Categories

Find more on MATLAB Parallel Server in Help Center and File Exchange

Products

Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!