isqnonlin: compute part of objective function outside of matlab

4 views (last 30 days)

SA-W on 11 Aug 2022

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/1778125-isqnonlin-compute-part-of-objective-function-outside-of-matlab

Commented: SA-W on 17 Oct 2022

I am solving a partial differential equation depending on some design variables (material parameters). I want to fit the material parameters to the vector of experimental data y using matlabs function isqnonlin.

I am solving the PDE for given material parameters with an external software; the solution vector thus obtained is denoted as s. What I want to minimize is the squared difference between s and y in the l2-norm by using the algorithm implemented in isqnonlin.

Given that I do not compute the PDE solution s in matlab itself, is it possible to use isqnonlin?

2 Comments
Show NoneHide None

Torsten on 12 Aug 2022

And did you already test the data transfer between your PDE solver and MATLAB ?

SA-W on 12 Aug 2022

I can transfer data between my PDE solver and MATLAB, that is no problem.

Accepted Answer

Torsten on 12 Aug 2022

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1778125-isqnonlin-compute-part-of-objective-function-outside-of-matlab#answer_1025445

Edited: Torsten on 12 Aug 2022

Open in MATLAB Online

Then call "lsqnonlin" as

p0 = ...;   % initial guess for the vector of material parameters
sol = lsqnonlin(@(p)fun(p,y),p0)

and write a function "fun" as

function res = fun(p,y)
s = result_from_your_PDE_solver_for_the_vector_of_material_parameters_p_corresponding_to_y(p)
res = s - y;
end

By the way:

The name of the MATLAB function is lsqnonlin (least-squares solver for nonlinear problems), not isqnonlin.

134 Comments
Show 132 older commentsHide 132 older comments

SA-W on 17 Aug 2022

"And the main problem is that with such a big number of free parameters, there will be many combinations with approximately the same value for the objective function, but unphysical."

"Did you already test whether you can recover your p0 starting thus far away from the solution ?"

I tried it and it happened exactly what you said.

I set p0(1)=250000, p0(2)=187500, p0(3)=125000, p0(4)=62500, p0(5)=0, p0(6)=62500, p0(7)=125000, p0(8)=187500, p0(9)=250000. With that start values, lsqnonlin returns the solution p(1)=244537, p(2)=186631, p(3)=66641, p(4)=47163, p(5)=0, p(6)=46287, p(7)=64148, p(8)=194005, p(9)=243482

However, the p0 fulfilling y(p0)=y_experiment is p0(1)=374486, p0(2)=178626, p0(3)=71016. p0(4)=16342, p0(5)=0, p0(6)=14522, p0(7)=55572, p0(8)=120409, p0(9)=207186

Clearly, the returned solution is not the correct one. The solver also returned "local minimum possible". "lsqnonlin stopped because the final change in the sum of squares relative to its initial values is less than the value of function tolerance".

Based on the recommendation in the help menu, I restarted lsqnonlin with the returned solution, however, the result is the same.

Do you think it makes sense to scale p somehow?

SA-W on 18 Aug 2022

"Scaling is always a good idea, especially if the parameters to be determined have different orders of magnitude. But in your case, they all seem to be of the same order, appr. 1e5 ?"

As for the synthetic data I am currently working on, yes. But for the "real" data, the order of magnitude could also be 1e4 or 1e6.

"What was the norm of the residual for the solution (i.e. norm(y-y_exp)) lsqnonlin got with p(1)=244537, p(2)=186631, p(3)=66641, p(4)=47163, p(5)=0, p(6)=46287, p(7)=64148, p(8)=194005, p(9)=243482 ?"

The squared norm of the residual as returned by lsqnonlin is 2.2432e-6. The first order optimality measure is 1.8982e-6.

Should these values be even smaller in the global minimum?

"I guess for p0(1)=374486, p0(2)=178626, p0(3)=71016. p0(4)=16342, p0(5)=0, p0(6)=14522, p0(7)=55572, p0(8)=120409, p0(9)=207186 the residual was 0 ?"

Here the norm of the residual is 6.8716e-13 and the first order optimality measure is 1.42e-9.

That said, should I decrease the residual tolerance? Obviously, a change from 1e-6 to 1e-13 in the residual comes with large changes for p.

SA-W on 25 Aug 2022

I double-checked the supplied Jacobian to Matlab and it should be correct. However, the gradient check still fails - even using the central method with several different step sizes. I also did a forward scheme by myself, which also reveals differences.

I omitted the gradient check and invoked lsqnonlin with my Jacobian and the start values we already talked about in this chat: in the first iteration norm(res) = 0.41 and after nine iterations norm(res)=0.006. Thence norm(res) does not change anymore and lsqnonlin returns because the StepTolerance is too small ( I already set it to 1e-10).

For comparison, I did the same calculation without feeding the Jacobian to lsqnonlin, i.e. using finite differences, but with the same tolerances (FunctionTolerance, OptimalityTolerance, StepTolerance). After 13 iterations, norm(res)=10^-6 with a final step size of about 1e-4.

The two solutions are far away from each other (10-20% difference); we already talked about reasons for that. But is it a well-known problem that, if the Jacobian is wrong at some places, norm(res) will not change anymore (close to a possible solution). Using finite differences, norm(res) was pushed several orders of magnitude lower. I am just asking myself whether a wrong Jacobian can lead the solver at all to a point with norm(res)=0.006 or not.

SA-W on 8 Sep 2022

Open in MATLAB Online

function res = fun(p,y)
s = result_from_your_PDE_solver_for_the_vector_of_material_parameters_p_corresponding_to_y(p)
res = s - y;
end

This is the code stored in the answer to my question. Now, I am working with 3 experimental vectory 'y' (time-dependent problem).

The gradient vector of the objective function res with respect to the parameters p is calculated like this:

d(res)/d(p) = 2*(s1-y1)*J1 + 2*(s2-y2)*J2 + 2*(s3-y3)*J3

Is Matlab doing the pre-multiplications ( 2*(s1-y1),... ) of the Jacobians J automatically, such that it is enough to modify the code to this?

function res = fun(p,y)
    s1 = result_from_pde_solver
    s2 = result_from_pde_solver
    s3 = result_from_pde_solver
    res = (s1-y1 + s2-y2 +s3-y3);
end

In case I provide my own Jacobians to Matlab the code roughly looks like:

function [res, J] = fun(p,y)
    s1 = result_from_pde_solver
    J1 = result_from_pde_solver
    s2 = result_from_pde_solver
    J2 = result_from_pde_solver
    s3 = result_from_pde_solver
    J3 = result_from_pde_solver
    
    res = (s1-y1 + s2-y2 +s3-y3);
end

What is the output J in that case?

I mean it makes no sense to return the sum J=J1+J2+J3. It also makes no sense to return all of them because Matlab does not know that J1 belongs to (s1-y1).

SA-W on 8 Sep 2022

"You shouldn't do that. Suppose s1-y1 = -20000, s2-y2 = 10000 and s3-y3 = 10000 after fitting.

Would you accept this fitting result ?"

No, not really.

Wrting

min f(p) = ||s1-y1||^2 + ||s2-y2||^2 + ||s3-y3||^2

is the standard in (solid mechanics) literature. But, of course, what is meant by that is to minimize each of these summands. The correct way to do this is like I described it in my last comment, is it?

(here a copy of the comment)

" If I get the gist of what you said then the way to minimize

f(p) = ||s1-y1||^2 + ||s2-y2||^2 + ||s3-y3||^2

is to write the three vectors s and y each in ONE single (column) vector:

s = (s1;s2;s3) y=(y1;y2;y3)

The same with the Jacobians (say the size of s and p is two):

J1 = [d(s11)/dp1 d(s11)/dp2 ; d(s12)/dp1 d(s12)/dp2]

J1 = [d(s21)/dp1 d(s21)/dp2 ; d(s22)/dp1 d(s22)/dp2]

J1 = [d(s31)/dp1 d(s31)/dp2 ; d(s32)/dp1 d(s32)/dp2]

maps into

J = [J1; J2; J3] =

[d(s11)/dp1 d(s11)/dp2 ; d(s12)/dp1 d(s12)/dp2;

d(s21)/dp1 d(s21)/dp2 ; d(s22)/dp1 d(s22)/dp2;

d(s31)/dp1 d(s31)/dp2 ; d(s32)/dp1 d(s32)/dp2]

Is that correct?

SA-W on 9 Sep 2022

Open in MATLAB Online

I reduced the number of parameters to one to debug my Jacobian. Both the supplied Jacobian and the finite diifference approach give my the correct solution for the parameter with nearly the same convergence behavior.

However, I want to validate my Jacobian with the GradientCheck -- which still fails.

options = optimoptions(@lsqnonlin, ...
    'SpecifyObjectiveGradient', true, 
    'CheckGradients', true, 'FiniteDifferenceStepSize', 1, 'FiniteDifferenceType', 'central');
sol = lsqnonlin(@(p)fun(p,y_exp),1000,0,1e12, options);

I supplied a step size of one with central finite differences. The start value of my parameter is p0 = 1000. I wrote all values of p to a file whenever lsqnonlin cally 'fun'. Given a step size of one, I expected the file content to be the three values

1000; 999; 1001.

However, the file contains four values

1000; 999,081...; 0; 1998,162...

These values also change when I restart my program. A second run gave me, for instance,

1000; 1000,701...; 0; 2001,402...

How can that happen?

SA-W on 12 Sep 2022

Open in MATLAB Online

I will do that but, anyway, I would like to share the output with you

%experimental displacement
y_exp = importdata('./displacement_experiment/loadstep_10.txt');
%start value for kappa and bounds
kappa0 = 1e3;
lb = 0;
ub = 1e15;
options = optimoptions(@lsqnonlin, 'StepTolerance', 1e-10', 'FunctionTolerance', 1e-10, 'OptimalityTolerance', 1e-10, ...
    'SpecifyObjectiveGradient', true, 'PlotFcn', 'optimplotresnorm', ...
    'CheckGradients', true, 'FiniteDifferenceStepSize', 1e-1, 'FiniteDifferenceType', 'central');
[sol,resnorm,residual,exitflag,output,jacobian] = lsqnonlin(@(p)funJ(p,y_exp),kappa0,lb,ub, options);
function [f, J] = funJ(p,y_exp)
%write CURRENT p into a file for pde solver to read in
file_kappa = fopen('./kappa.txt', 'w');
fprintf(file_kappa, '%16.12f\n', p);
fclose(file_kappa);
%run pde solver and load solution and Jacobian
system('cd ..; make run')
y_sim = importdata('./displacement_simulation/loadstep_10.txt');
J = importdata('./jacobian/loadstep_10.txt');
%compute least square objective
f = y_sim - y_exp;
%write ALL p's at all iterations into a file for debugging purposes
file_debug = fopen('./kappa_debug.txt', 'a');
fprintf(file_debug, '%16.12f\n', p);
fclose(file_debug);
end

After running the above program the file 'kappa_debug.txt' stores the values

000000000000
804919190001
724427271001
885411109001

Clearly, this sequence of values does not correspond at all to the step size of 1e-1 which I provided in the options.

Objective function derivatives:
Maximum relative difference between supplied 
and finite-difference derivatives = 1.09343e-06.
Supplied derivative element (1,1):     -9.41149e-06
Finite-difference derivative element (1,1): -1.05049e-05

I also double-checked that the "Supplied derivative element (1,1): -9.41149e-6" corresponds to p=1000.804919190001, i.e., not the start value but the second value from the above file; I thought the gradient check will compare the Jacobians at the start value.

Does the output make sense to you?

Bruno Luong on 13 Sep 2022

Edited: Bruno Luong on 13 Sep 2022

In the presence of noise, the norm of the residual is > 0, there is no convergence of the residual.

For Newton method, under some strict hypothesis, the convergence is quadratic meaning

| p_k - p_true | <= (K * | p_0 - p_true |) ^(2^k)

But lsqnonlin is not Newton method. It's a quasi-Newton, a hybrid between Newton and gradient method. Conjugate gradient method connvere linearly

| p_k - p_true | <= C^k * | p_0 - p_true |

with C ~ 1 - sqrt(cond(J'*J)).

In practice due to non-exact line search, the descend direction is not optimally estimated, and espetially for non-linear problems, most of the iterations are carried out in the regime BEFORE all the hypothesis where those nice theoretical convergence rate can be applied, unless one starts very close to the true solution.

It seems to me similar bound are applied on the first order as well, meaning

|g_k| <= C^k * | g_0 |

etc... The observation of g_k is the the most convenient since one doesn't need to know the true solution, and just observe the exponent of g_k decrease at constant rate to -infinity at the very end (only few last iterations then). If that is observed the algorithm would converge well.

SA-W on 14 Sep 2022

Edited: SA-W on 14 Sep 2022

Open in MATLAB Online

May I ask again for your opinion:

I am pretty sure that the calculation of my supplied Jacobian is now correct -- the gradient check works and also comparing norm(res) with FD over the iterations gives nearly the same values. However, I tried to run an example where, after some iterations, only finite differences continues:

norm(res) analytical Jacobian:

95663775628694
84086338912438
79471586926105
71994927130471
62150098491207
56915151393085
49998727669244
44724501123508
38919775862164
30224620636652
29155253514181
29463664342493
29662902334702
29613746460782
29601240801725
29146830530547
29197155676708
29157674620784
29149432809217
29147474333004
29146991056537
29146870635504
29146840555128
29146833036589
29146831157051
29146830687173
29146830569704
29146830540336
29146830532994
29146830531159
29146830530700
29146830530585
29146830530557
29146830530550

norm(res) finite differences:

956637756945
840863392139
794715873055
719949277385
621501138062
569152884902
499989625208
447247009461
389199788297
302248150577
292361086745
289960172256
219842464284
219554406578
011194862511
001210235899
000039441337
000007324984
000007324502

As you can see, the first 10 iterations are nearly equal, thence only FD makes further progress. Clearly, the Jacobian at the, say 10th iteration, has to be a different one.

My only explanation is that the least squares objective reaches a valley or some kind of discontinuity and finite differences can "jump" over that region by evaluating the pde outside of that region.

Can you imagine other problems to check for?

SA-W on 14 Sep 2022

Open in MATLAB Online

"Any discrete dependency is (numerically) non diffenriable. You should make the number of iterations unchanged, even if it is overkilled."

I am pretty sure you are right, but I do not understand the gist behind it. In case of finite differences, Matlab calls my pde solver, say, two times with p and (p+h), my pde solver returns u(p) and u(p+h). Matlab finally computes (u(p+h)-u(p))./h . Why does it matter if u(p+h) required 5 Newton iterations and u(p) only 3? Matlab only sees the converged u's.

"Why you think it is not possible that rank(J) is 5. And beside that rank is estimated by thresholding on the largest singular value. Compute rank(J'*J) can give numerical rank lower than that of rank(full(J)). The condition number of J"*J is tthe square of J."

What I know from the literature is that rank(J)=rank(J' *J). That is why I am confused about

rank(full(J'*J))) = 5 ; rank(full(J)) = 10

I also computed

cond(full(J)) = 1.470330010022814e+12 ; cond(full(J'*J)) = 1.016539996652010e+20

These numbers indicate that my Jacobian is ill-conditioned, right?

SA-W on 14 Sep 2022

Open in MATLAB Online

Thank you.

I try to understand the first order optimality measure as defined in the doc:

For least-squares solvers and trust-region-reflective algorithms, in problems with bounds alone, the first-order optimality measure is the maximum over i of |vi*gi|. Here gi is the ith component of the gradient, x is the current point, and

vi = ∣xi−bi∣ if the negative gradient points toward bound bi, ONE otherwise.

I calculated the gradient g at the solution as g=2*jacobian'*residual

g =
1.0e-09 *
0.000000767068843
-0.002926767975382
0.010020383786136   fixed
-0.000000523224432
-0.002928045248151
0.647615092338367   fixed
-0.799689984542153  fixed
0.191622430726592   fixed
-0.047905605200400  fixed
0.007984268370923   fixed

I fixed six of the ten parameters, the remaining four have lower bound zero and upper bound infinity. The first order optimality measure is 5.690559991760299e-11. In case the negative gradient points toward bound bi, then vi would simply be the solution according to the definition (lower bound zero) and the first order optimality is the maximum of |gi*sol_i |.

sol =
0e+05 *
483715587103306
451988809262640
0
397818885189575
990392446191780
0
385839000000000
140300000000000
633770000000000
236680000000000

However, the pointwise product |gi sol_i | also does not include 5.690559991760299e-11. So how can I calculate the first order optimality in the above example?

SA-W on 16 Sep 2022

Open in MATLAB Online

"IMO, sorry to tell you directly but it is not serious to work with J with condition number of 1e12, exitflag = 3 and gradient test fail. You must to make those numerical obstacles going away before conclude anything that is trustworty."

The gradient check does not fail anymore. Also, for example, a start vector

p0 = [100000;50000;0;50000;100000]

converges to the true exact solution

p = [21844;5183;0;4844;18939]

. The associated exitflag = 3, although it is the true solution (synthetic data).

You are absolutely right with the bad condition number of J. I am wondering anyway how such a Jacobian results in the correct solution?

I am not sure if I can scale my parameters, because these are material parameters and my pde solver does not converge at all if I change the order of magnitude of them somehow. Is there a way to scale the COMPUTED Jacobian to reduce the condition number or do I have to compute J with scaled parameters?

SA-W on 17 Oct 2022

Sorry that I come back on this again. But I would appreciate your feedback in this regard:

If the parameters at some point during optimization do not change anymore, what might be useful to check for underlying reasons?

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

isqnonlin: compute part of objective function outside of matlab

2 Comments
Show NoneHide None

Accepted Answer

134 Comments
Show 132 older commentsHide 132 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

isqnonlin: compute part of objective function outside of matlab

2 Comments Show NoneHide None

Accepted Answer

134 Comments Show 132 older commentsHide 132 older comments

2 Comments
Show NoneHide None

134 Comments
Show 132 older commentsHide 132 older comments