You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
fmincon optimization: is the first order optimality very sensititve to changes in the step tolerance?
11 views (last 30 days)
Show older comments
SA-W
on 27 Nov 2023
I use fmincon interior-point algorithm to fit parameters to a pde.
Here are my basic settings:
opts = optimoptions('fmincon', ...
'StepTolerance', 1e-12, ...
'FunctionTolerance', 1e-12, ...
'OptimalityTolerance', 1e-12, ...
'MaxIterations', 250,...
'SpecifyObjectiveGradient', true, ...
'CheckGradients', false);
lb = zeros(9,1);
ub = 4 + zeros(9,1);
Aineq = ... ; % entries have dimension 1e2
bineq = zeros(9,1);
problem = createOptimProblem( ...
params.solverName, ...
'objective', myFun, ...
'x0', startVec, ...
'lb',lb, ...
'ub',ub, ...
'Aineq', Aineq,
'bineq', bineq,
options=opts);
%create multistart object
ms = MultiStart('Display', 'iter', ...
'UseParallel', true, ...
'StartPointsToRun', 'all', ...
'FunctionTolerance', 0);
% run
run(ms, problem, myStartPoints)
Thre are nine parameters and I have lower bounds and upper bounds as well as linear inequality constraints.
I scaled the matrix Aineq by 1e2 manually such that fmincon pays more attention to feasability. I am aware that this comes with poor convergence and other drawbacks, but it proved to work quite well so far. The reason to choose those tight tolerances (1e-12) is to work around flat regions of the objective function, if any.
Using these options, I get the following output from multistart:
The solution of all 10 runs is
x = [0 0.00838947 0.0167789 0.0251684 0.0335579 0.0419473 0.0503368 0.0587263 0.0673571]
All solutions have exitflag=2 (probably because of the brutal scaling) and the same value of the objective function. Also, the first-order optimality is small.
However, run index = 4, for instance, converged to the same solution, but the first-order optimality is rather big compared to the other ones.
This gets even more visible if I relax all my tolerances (step, function, first-order-optimality tolerance) to the default value of 1e-6:
The solutions is nearly the same as before
x = [0 0.00838945 0.0167789 0.0251684 0.0335578 0.0419473 0.0503367 0.0587262 0.067357]
however, the first-order optimality is higher by several orders of magnitudes, but the solution is nearly unchanged which is also indicated by the sum of squares.
Those high optimalities make the solution less trustworthy.
How is it possible that the optimalities are so different if the sum of squares as well as the solution are practically identical?
7 Comments
Harald
on 27 Nov 2023
Hi,
I suppose you are solving the PDE numerically? In that case, please consider
This explains that you may also want to set FiniteDifferenceStepSize and the reasons behind doing so. Perhaps, this will also help with your original question.
Best wishes,
Harald
SA-W
on 27 Nov 2023
Hello,
yes, I solve the pde numerically but I provide an analytical gradient to fmincon (see my options). Not sure what happens with the hessian, but for the gradient, there is no finite-differencing.
SA-W
on 27 Nov 2023
I know. I am just looking for a qualitative assessment.
So given the results I show, can we qualititatively say that the objective is likely to be very flat at the solution? Or something else?
Torsten
on 27 Nov 2023
Edited: Torsten
on 27 Nov 2023
Are you sure the results from the solution of the PDE are exact enough to build an optimization upon them ? Most often - if parameters are to be determined from the solution of ODEs or PDEs - the sensitivities cannot be computed with sufficient reliability.
Torsten
on 27 Nov 2023
I suggest you compute the objective function near the point that MATLAB computes as optimal by changing each parameter separately while holding the other parameters constant and see what kind of curves you get.
Accepted Answer
Matt J
on 27 Nov 2023
Edited: Matt J
on 27 Nov 2023
So given the results I show, can we qualititatively say that the objective is likely to be very flat at the solution? Or something else?
Well, it basically means that a small change in x (near the stopping point) produces a large change in the gradient. The function would seem to have very high curvatures there, or possibly has a discontinuous first derivative.
25 Comments
SA-W
on 27 Nov 2023
All linear constraints are active at the solution and they are continously differentiable. Also, I think the objective function is continuously differentiable near the solution.
The function would seem to have very high curvatures there
How would you proceed to figure this out? The parameter space is nine-dimensional and visualizing anything is probably hard.
Matt J
on 27 Nov 2023
Edited: Matt J
on 27 Nov 2023
Here's a qualitatively similar example. You can see the first order optimalities are again very different without changing the solution very much.
for tol=[1e-12,1e-6]
opts = optimoptions('fmincon', ...
'StepTolerance', tol, ...
'FunctionTolerance', tol, ...
'OptimalityTolerance', tol, ...
'MaxIterations', inf,'MaxFunctionEvaluations', inf,...
'SpecifyObjectiveGradient', false, ...
'CheckGradients', false,'Display','none');
f=@(z)sum( z.^2.*(z<0)+z.*(z>=0));
[x,fval,eflag,stats]=fmincon(@(x) f(x-[1,2,3])+3,rand(1,3)-0.5,[],[],[],[],[],[],[],opts)
end
x = 1×3
1.0000 2.0000 3.0000
fval = 3.0000
eflag = 2
stats = struct with fields:
iterations: 38
funcCount: 244
constrviolation: 0
stepsize: 1.6412e-12
algorithm: 'interior-point'
firstorderopt: 0.7603
cgiterations: 47
message: 'Local minimum possible. Constraints satisfied.↵↵fmincon stopped because the size of the current step is less than↵the value of the step size tolerance and constraints are ↵satisfied to within the value of the constraint tolerance.↵↵<stopping criteria details>↵↵Optimization stopped because the relative changes in all elements of x are↵less than options.StepTolerance = 1.000000e-12, and the relative maximum constraint↵violation, 0.000000e+00, is less than options.ConstraintTolerance = 1.000000e-06.'
bestfeasible: [1×1 struct]
x = 1×3
1.0000 2.0000 3.0000
fval = 3.0000
eflag = 2
stats = struct with fields:
iterations: 23
funcCount: 147
constrviolation: 0
stepsize: 1.0967e-06
algorithm: 'interior-point'
firstorderopt: 8.0766e-05
cgiterations: 33
message: 'Local minimum possible. Constraints satisfied.↵↵fmincon stopped because the size of the current step is less than↵the value of the step size tolerance and constraints are ↵satisfied to within the value of the constraint tolerance.↵↵<stopping criteria details>↵↵Optimization stopped because the relative changes in all elements of x are↵less than options.StepTolerance = 1.000000e-06, and the relative maximum constraint↵violation, 0.000000e+00, is less than options.ConstraintTolerance = 1.000000e-06.'
bestfeasible: [1×1 struct]
Matt J
on 27 Nov 2023
How would you proceed to figure this out?
You would compute the Hessian and see if it has any large entries or large eigenvalues.
SA-W
on 27 Nov 2023
You can see the first order optimalities are again very different without changing the solution very much.
Why is this the case here? The function is very smooth and the first derivative is continuous, right?
SA-W
on 27 Nov 2023
The discontinuity did not cause any problems in the optimization (same result). That said, it may not be a problem in practice? I mean as long as the converged solution is the same, I do not have to worry about the discontinuity.
Matt J
on 27 Nov 2023
The interior point algorithm theoretically assumes a twice continuously differentiable function and constraints, but much of the time it does work when the continuity is only piecewise.
SA-W
on 27 Nov 2023
You would compute the Hessian and see if it has any large entries or large eigenvalues.
hessian =
1.0e+07 *
0.0546 0.0014 -0.0286 -0.0145 -0.0104 -0.0072 -0.0181 0.0244 -0.0016
0.0014 0.0042 0.0019 0.0014 0.0009 0.0018 0.0041 -0.0150 -0.0007
-0.0286 0.0019 0.0166 0.0086 0.0063 0.0046 0.0138 -0.0253 0.0021
-0.0145 0.0014 0.0086 0.0052 0.0016 0.0106 -0.0200 0.0355 -0.0284
-0.0104 0.0009 0.0063 0.0016 0.0064 -0.0186 0.0739 -0.1343 0.0742
-0.0072 0.0018 0.0046 0.0106 -0.0186 0.1034 -0.3381 0.6104 -0.3670
-0.0181 0.0041 0.0138 -0.0200 0.0739 -0.3381 1.1674 -2.1213 1.2382
0.0244 -0.0150 -0.0253 0.0355 -0.1343 0.6104 -2.1213 3.8685 -2.2429
-0.0016 -0.0007 0.0021 -0.0284 0.0742 -0.3670 1.2382 -2.2429 1.3263
This is the hessian for the solution I mentioned in my question. The entries of the hessian differ only by one order of magnitude.
The smallest eigenvalue is 0.0939 and the largest 6.5e+7. This makes the condition number of the hessian large, but this has probably to do with the units of my parameters.
The function would seem to have very high curvatures there,
Do you think this could be the case here? Admittedly, I have never looked at the Hessian so far but, compared to the parameters (between 0 and 1), the second derivatives are very large.
Matt J
on 27 Nov 2023
Edited: Matt J
on 27 Nov 2023
The curvatures don't seem large enough to account for the difference we see. However, looking at the hessian won't detect a discontinuous first derivative. As illustrated below, the Hessian explodes only if you take the Hessian right at the point of discontinuity.
f=@(z)z.^2.*(z<0)+z.*(z>=0);
x=linspace(-2,2,1e6);
dx=x(2)-x(1);
Hess=gradient(gradient(f(x),dx),dx);
plot(x,Hess,'.-'); xlabel x; ylabel('\nabla^2f(x))')
SA-W
on 27 Nov 2023
The curvatures don't seem large enough to account for the difference we see.
They are in the order 1e7. Is that not big? Or what was your criterion to come to that statement?
Matt J
on 27 Nov 2023
Edited: Matt J
on 27 Nov 2023
I take it back. It probably is big enough. The changes in x you are seeing are ~1e-6, so with 2nd derivatives that are ~1e7, then you can see changes in the first derivatives of ~1e-6*1e7 = ~1e-1, which is largely what you are seeing, assuming the first order optimality numbers are comparable to the gradient magnitudes at the solution.
SA-W
on 27 Nov 2023
The changes in x you are seeing are ~1e-6, so with 2nd derivatives that are ~1e7, then you can see changes in the first derivatives of ~1e-6*1e7 = ~1e-1, which is largely what you are seeing,
Maybe I miss something. The changes in the first-order optimality are at least ~1e5 (they reduced from 1e-5 to 1e0, roughly). So what did you mean by
~1e-1, which is largely what you are seeing
?
Matt J
on 27 Nov 2023
Edited: Matt J
on 27 Nov 2023
they reduced from 1e-5 to 1e0, roughly
To me, it looks like the average FOOM across the different multistarts is initially about 1e-6 and then, after relaxing the tolerances, the average is about 1e-1. Therefore, the change is,
change = 1e-1 - 1e-6
change = 0.1000
which is still ~1e-1.
SA-W
on 27 Nov 2023
The changes in x you are seeing are ~1e-6, so with 2nd derivatives that are ~1e7, then you can see changes in the first derivatives of ~1e-6*1e7 = ~1e-1, which is largely what you are seeing,
Did you mean ~1e-6*1e7 = ~1e+1? Of course, exact numbers are nonsense here, but just to make sure you did not refer to something else with ~1e-1.
More important from a practical viewpoint: If I know my objective has large curvatures, is it reasonable to choose a smaller step size? The only benefit I see from this is that the first-order optimality is closer to zero which makes the results more trustworthy but the effect on the solution is negligible.
Matt J
on 28 Nov 2023
Edited: Matt J
on 28 Nov 2023
Did you mean ~1e-6*1e7 = ~1e+1?
Yes, that's what I really meant.
More important from a practical viewpoint: If I know my objective has large curvatures, is it reasonable to choose a smaller step size?
You mean a smaller StepTolerance? It can definitely make a difference. Here's an example, where we make the curvatures both small p=2 and large p=7, and you can see that, for the latter, the tolerance makes a big difference in how close you come to the true solution at x=0:
x0=[787.5362 617.2179 596.9010 784.4406 5.9046];
for p=[2,7]
p,
for tol=[1e-6,1e-12];
opts = optimoptions('fmincon', ...
'StepTolerance', tol, ...
'FunctionTolerance', tol, ...
'OptimalityTolerance', tol, ...
'MaxIterations', inf,'MaxFunctionEvaluations', inf,...
'SpecifyObjectiveGradient', false, ...
'CheckGradients', false,'Display','none');
[x,fval,eflag,stats]=fmincon(@(x) norm(x+5,p)^p,x0,...
[],[],[],[],[0,0,0,0,0],[],[],opts);
distance_to_true_solution = norm(x)
end
end
p = 2
distance_to_true_solution = 4.4737e-07
distance_to_true_solution = 4.4723e-09
p = 7
distance_to_true_solution = 0.5628
distance_to_true_solution = 8.1778e-14
SA-W
on 28 Nov 2023
Edited: SA-W
on 28 Nov 2023
Here's an example, where we make the curvatures both small p=2 and large p=7, and you can see that, for the latter, the tolerance makes a big difference in how close you come to the true solution at x=0:
What we see here is what I expected for small curvatures (flat objectives) too. There, the gradients are small and the optimizer may think it arrived at a minimum and stops to early. Increasing the step tolerance may helps to approach closer to the minimum.
From that, can we say that a small step tolerance makes sense if I have no idea about the dimension of second derivatives? Small step tolerance leads to more iterations, but lets forget about that.
Matt J
on 28 Nov 2023
Edited: Matt J
on 28 Nov 2023
There, the gradients are small and the optimizer may think it arrived at a minimum and stops to early. Increasing the step tolerance may helps to approach closer to the minimum.
No, I don't think so. In that situation, you would want to tighten the optimality tolerance and maybe also the function tolerance.
From that, can we say that a small step tolerance makes sense if I have no idea about the dimension of second derivatives?
Yes, but it may not be enough. If you have no idea about the second derivatives and they happen to be very low, you may need to make the optimality tolerance extra-small as well (see above). Generally speaking, tightening all of the tolerances always reduces the risk of premature stopping.
SA-W
on 28 Nov 2023
Yes, but it may not be enough. If you have very low curvatures, you may need to make the optimality tolerance extra-small as well (see above). Generally speaking, tightening all of the tolerances always reduces the risk of premature stopping.
That said, would you say setting all tolerances to 1e-12 is reasonable? My parameters are all greater zero and less than 5, I expect the sum of squares to be between 1 and 100.
Matt J
on 28 Nov 2023
Edited: Matt J
on 28 Nov 2023
1e-12 is what I usually use. But you really can't look to the stopping tolerances and exit statistics alone to gain confidence in an optimization's performance. You should be testing on several simulated versions of the problem where you know the desired answer, and seeing how close the optimizer gets.
SA-W
on 28 Nov 2023
I always perform numerical experiments first where I know the exact solution, and perform a multistart simulation with several start points. If this works, I move on to the real experiment and perform again a multistart run.
My sanity check is to verify that all local solver runs converged to the same solution. Since I do not know the exact solution, I can not do much more. Or can you recommend some nice checks here? I mean if you get a solution from optimiztion where you do not know the exact solution, how do you convince yourself that the solution is correct if the exitflags are not trustworthy enough?
Matt J
on 28 Nov 2023
I don't do anything. I rely on experience with the numerical tests for my confidence.
SA-W
on 28 Nov 2023
Edited: SA-W
on 28 Nov 2023
Do you know tools (file exchange, toolbox functions,...) to postprocess optimization results?
Often, people calculate confidence intervals, R-coefficients, correlation matrices,... Of course, one could implement this self, but in case there are already enclosed packages, why not using them.
Matt J
on 28 Nov 2023
Edited: Matt J
on 28 Nov 2023
There is this thread,
As you will see in the comments though, it will probably require you to do your own Hessian calculation. It must be the Hessian of the Lagrangian, not the objective function, although I guess if you only have linear constraints, they will be the same thing.
SA-W
on 28 Nov 2023
It must be the Hessian of the Lagrangian, not the objective function, although I guess if you only have linear constraints, they will be the same thing.
Yes, I think so too.
Do you think it makes sense to calculate correlations,etc,... with a hessian that has a condition number ~1e7?
More Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)