fmincon check gradients fail

10 views (last 30 days)
mahmoud tarek
mahmoud tarek on 14 Mar 2020
Edited: Matt J on 26 Mar 2020
I'm trying to provide vectorized gradients for my constraints of my 'nonlincon' function to reduce the run time.
My optimization problem has 11 variables from x1 to x11 and my 'nonlincon' calculates 6 constraints so my gradient matrix is 11 by 6
i tried to mimic how fmincon calculates delta for finite differences.
but when i use check gradients with gradient constraint options the check gradients fails no matter what value i use for finite difference step size.
is there something wrong with how i am calculating the gradients or my delta?
this is the options i used with fmincon
options_fmincon = optimoptions('fmincon','ConstraintTolerance', 0.001,'Diagnostics','on','Display','iter-detailed','PlotFcn',{@optimplotx,@optimplotfunccount,@optimplotfval,@optimplotconstrviolation,@optimplotstepsize,@optimplotfirstorderopt},'OptimalityTolerance',0.001,'SpecifyConstraintGradient',true,'CheckGradients',true,'FiniteDifferenceStepSize',0.01,'TypicalX',ones(1,11));
The code is like this :
% fmincon Nonlinear constraints function
function [C, CEQ, Jacobian, DCEQ] = NonLinCon(X, LUT, IN_SPEC)
DOF.M(1).A = X(1);
DOF.M(2).A = X(2);
DOF.M(3).A = X(3);
DOF.M(4).A = X(4);
DOF.M(5).A = X(5);
DOF.M(1).B = X(6);
DOF.M(2).B = X(7);
DOF.M(3).B = X(8);
DOF.M(4).B = X(9);
DOF.M(5).B = X(10);
DOF.IB = X(11)*1e-3;
DOF.C = IN_SPEC.C;
DOF.D = IN_SPEC.D;
DOF.E = IN_SPEC.E;
DOF.F = IN_SPEC.F;
DOF.G = IN_SPEC.G;
DOF.H = IN_SPEC.H;
DOF.M(6).A = DOF.M(5).A;
DOF.M(6).B = DOF.M(5).B;
[~, OUT_SPEC] = calculatespecs(LUT,DOF);
if nargout > 2
T_X = ones(1,11);
v = ones(1,11);
delta = v.*sign(X).*max(abs(X),T_X);
Xp = X;
DOF_p = DOF;
DOF_p.M(1).A = [Xp(1) + delta(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1)];
DOF_p.M(2).A = [Xp(2), Xp(2) + delta(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2)];
DOF_p.M(3).A = [Xp(3), Xp(3), Xp(3) + delta(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3)];
DOF_p.M(4).A = [Xp(4), Xp(4), Xp(4), Xp(4) + delta(4), Xp(4), Xp(4), Xp(4), Xp(4), Xp(4), Xp(4), Xp(4)];
DOF_p.M(5).A = [Xp(5), Xp(5), Xp(5), Xp(5), Xp(5) + delta(5), Xp(5), Xp(5), Xp(5), Xp(5), Xp(5), Xp(5)];
DOF_p.M(1).B = [Xp(6), Xp(6), Xp(6), Xp(6), Xp(6), Xp(6) + delta(6), Xp(6), Xp(6), Xp(6), Xp(6), Xp(6)];
DOF_p.M(2).B = [Xp(7), Xp(7), Xp(7), Xp(7), Xp(7), Xp(7), Xp(7) + delta(7), Xp(7), Xp(7), Xp(7), Xp(7)];
DOF_p.M(3).B = [Xp(8), Xp(8), Xp(8), Xp(8), Xp(8), Xp(8), Xp(8), Xp(8) + delta(8), Xp(8), Xp(8), Xp(8)];
DOF_p.M(4).B = [Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9) + delta(9), Xp(9), Xp(9)];
DOF_p.M(5).B = [Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10) + delta(10), Xp(10)];
DOF_p.IB = [Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3 + delta(11)];
DOF_p.M(6).A = DOF_p.M(5).A;
DOF_p.M(6).B = DOF_p.M(5).B;
[~, OUT_SPEC_p] = calculatespecs(LUT,DOF_p);
Jacobian(:,1) = abs((double(OUT_SPEC_p.A) - double(OUT_SPEC.A)) ./ delta);
Jacobian(:,2) = abs((double(OUT_SPEC_p.B) - double(OUT_SPEC.B)) ./ delta);
Jacobian(:,3) = abs((log10(double(OUT_SPEC_p.C)) - log10(double(OUT_SPEC.C))) ./ delta);
Jacobian(:,4) = abs((double(OUT_SPEC_p.D) - double(OUT_SPEC.D)) ./ delta);
Jacobian(:,5) = abs((double(OUT_SPEC_p.E) - double(OUT_SPEC.E)) ./ delta);
Jacobian(:,6) = abs((double(OUT_SPEC_p.F) - double(OUT_SPEC.F)) ./ delta);
DCEQ = [];
end
C(1) = IN_SPEC.A ./ double(OUT_SPEC.A) - 1;
C(2) = IN_SPEC.B ./ double(OUT_SPEC.B) - 1;
C(3) = log10(IN_SPEC.C) ./ log10(double(OUT_SPEC.C)) - 1;
C(4) = IN_SPEC.D ./ double(OUT_SPEC.D) - 1;
C(5) = IN_SPEC.E ./ double(OUT_SPEC.E) - 1;
C(6) = IN_SPEC.F ./ double(OUT_SPEC.F) - 1;
CEQ = [];
end
  4 Comments
mahmoud tarek
mahmoud tarek on 26 Mar 2020
I tried to use tic toc at the start and end of my NonLinCon and Obj functions to calculate the time of one call and this is what i got:
First for NonLinCon with only 2 output arguments:
The function is called multiple times per iteration. The call time varies between 0.010 to 0.2 seconds.
Second for NonLinCon with 4 output arguments:
The call time varies between 0.01 to 0.06 seconds.
My Obj function has no gradient function.
The run time of my Obj function with one output argument varies between 0.000002 to 0.000759 seconds
Matt J
Matt J on 26 Mar 2020
Edited: Matt J on 26 Mar 2020
The call time varies between 0.010 to 0.2 seconds.
Don't you mean 0.01 to 0.02?
The function is called multiple times per iteration
You need to count how many times in an iteration it calls wtih 2 arguments (when it doesn't need the gradient) and how many times it calls with 4 (I assume once per iteration). If NonLinCon gets called 20 times per iteration with just 2 arguments, then accelerating the gradient calculation won't have much of an impact, because the gradient calculation represents a small fraction of the computation per iteration.
It may be helpful to move your finite difference operations to a separate function, called from within NonLinCon. You can then use the profiler
to see how much times is spent in gradient calculation versus other parts of NonLinCon.

Sign in to comment.

Accepted Answer

Matt J
Matt J on 14 Mar 2020
Edited: Matt J on 14 Mar 2020
CheckGradient uses central differences, whereas you appear to be using right handed differences. Are you certain that your constraints are differentiable?
  3 Comments
mahmoud tarek
mahmoud tarek on 17 Mar 2020
i tried to calculate the gradients using central differences to match check gradients but still fails
T_X = ones(1,11);
v = 0.01*ones(1,11); % FiniteDifferenceStepSize
delta = v.*max(abs(X),T_X); % Delta for central finite differences
Xp = X;
DOF_p = DOF;
DOF_p.M(1).A = [Xp(1) + delta(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1), Xp(1)];
DOF_p.M(2).A = [Xp(2), Xp(2) + delta(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2), Xp(2)];
DOF_p.M(3).A = [Xp(3), Xp(3), Xp(3) + delta(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3), Xp(3)];
DOF_p.M(4).A = [Xp(4), Xp(4), Xp(4), Xp(4) + delta(4), Xp(4), Xp(4), Xp(4), Xp(4), Xp(4), Xp(4), Xp(4)];
DOF_p.M(5).A = [Xp(5), Xp(5), Xp(5), Xp(5), Xp(5) + delta(5), Xp(5), Xp(5), Xp(5), Xp(5), Xp(5), Xp(5)];
DOF_p.M(1).B = [Xp(6), Xp(6), Xp(6), Xp(6), Xp(6), Xp(6) + delta(6), Xp(6), Xp(6), Xp(6), Xp(6), Xp(6)];
DOF_p.M(2).B = [Xp(7), Xp(7), Xp(7), Xp(7), Xp(7), Xp(7), Xp(7) + delta(7), Xp(7), Xp(7), Xp(7), Xp(7)];
DOF_p.M(3).B = [Xp(8), Xp(8), Xp(8), Xp(8), Xp(8), Xp(8), Xp(8), Xp(8) + delta(8), Xp(8), Xp(8), Xp(8)];
DOF_p.M(4).B = [Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9), Xp(9) + delta(9), Xp(9), Xp(9)];
DOF_p.M(5).B = [Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10), Xp(10) + delta(10), Xp(10)];
DOF_p.IB = [Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3, Xp(11)*1e-3 + delta(11)];
DOF_p.M(6).A = DOF_p.M(5).L;
DOF_p.M(6).B = DOF_p.M(5).RHO;
[~, OUT_SPEC_p] = calculatespecs(LUT,DOF_p);
Xn = X;
DOF_n = DOF;
DOF_n.M(1).A = [Xn(1) - delta(1), Xn(1), Xn(1), Xn(1), Xn(1), Xn(1), Xn(1), Xn(1), Xn(1), Xn(1), Xn(1)];
DOF_n.M(2).A = [Xn(2), Xn(2) - delta(2), Xn(2), Xn(2), Xn(2), Xn(2), Xn(2), Xn(2), Xn(2), Xn(2), Xn(2)];
DOF_n.M(3).A = [Xn(3), Xn(3), Xn(3) - delta(3), Xn(3), Xn(3), Xn(3), Xn(3), Xn(3), Xn(3), Xn(3), Xn(3)];
DOF_n.M(4).A = [Xn(4), Xn(4), Xn(4), Xn(4) - delta(4), Xn(4), Xn(4), Xn(4), Xn(4), Xn(4), Xn(4), Xn(4)];
DOF_n.M(5).A = [Xn(5), Xn(5), Xn(5), Xn(5), Xn(5) - delta(5), Xn(5), Xn(5), Xn(5), Xn(5), Xn(5), Xn(5)];
DOF_n.M(1).B = [Xn(6), Xn(6), Xn(6), Xn(6), Xn(6), Xn(6) - delta(6), Xn(6), Xn(6), Xn(6), Xn(6), Xn(6)];
DOF_n.M(2).B = [Xn(7), Xn(7), Xn(7), Xn(7), Xn(7), Xn(7), Xn(7) - delta(7), Xn(7), Xn(7), Xn(7), Xn(7)];
DOF_n.M(3).B = [Xn(8), Xn(8), Xn(8), Xn(8), Xn(8), Xn(8), Xn(8), Xn(8) - delta(8), Xn(8), Xn(8), Xn(8)];
DOF_n.M(4).B = [Xn(9), Xn(9), Xn(9), Xn(9), Xn(9), Xn(9), Xn(9), Xn(9), Xn(9) - delta(9), Xn(9), Xn(9)];
DOF_n.M(5).B = [Xn(10), Xn(10), Xn(10), Xn(10), Xn(10), Xn(10), Xn(10), Xn(10), Xn(10), Xn(10) - delta(10), Xn(10)];
DOF_n.IB = [Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3, Xn(11)*1e-3 - delta(11)];
DOF_n.M(6).A = DOF_n.M(5).A;
DOF_n.M(6).B = DOF_n.M(5).B;
[~, OUT_SPEC_n] = calculatespecs(LUT,DOF_n);
%%%%%%%%%%% Jacobian using central finite differences %%%%%%%%%%%%%
Jacobian(:,1) = (double(OUT_SPEC_p.A) - double(OUT_SPEC_n.A)) ./ (2.*delta);
Jacobian(:,2) = (double(OUT_SPEC_p.B) - double(OUT_SPEC_n.B)) ./ (2.*delta);
Jacobian(:,3) = (log10(double(OUT_SPEC_p.C)) - log10(double(OUT_SPEC_n.C))) ./ (2.*delta);
Jacobian(:,4) = (double(OUT_SPEC_p.D) - double(OUT_SPEC_n.D)) ./ (2.*delta);
Jacobian(:,5) = (double(OUT_SPEC_p.E) - double(OUT_SPEC_n.E)) ./ (2.*delta);
Jacobian(:,6) = (double(OUT_SPEC_p.F) - double(OUT_SPEC_n.F)) ./ (2.*delta);
Matt J
Matt J on 17 Mar 2020
Edited: Matt J on 17 Mar 2020
I recommend plotting your Jacobian elements as a function of delta. They should converge to something as delta gets smaller. The first question is if they do converge. The second question is, if they do converge, how does the limit compare to Matlab's Jacobian calculation result and/or the result of jacobianest (from the link I gave earlier). Do this analysis for all the elements.

Sign in to comment.

More Answers (0)

Categories

Find more on Systems of Nonlinear Equations in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!