Clear Filters
Clear Filters

selecting units for (1) scaling of variable and (2) condition number minimization

4 views (last 30 days)
In gradient-based optimization problem, selecting units can influence the condition number of the gradient. A smaller condition number is generally good for optimization.
At the same time, selecting untis can also make the variable unblanced. For example, one variable maybe of the scale of 10^10, while another variable maybe of the scale of 0.0001.
My experience is if I make the variable of similar scale, the optimization problem generally finishes good.
Sometimes, the the two objectives are contradictory to each other.
How to balance these two contradictions? Thank you very much!
  1 Comment
Matt J
Matt J on 4 Aug 2023
Edited: Matt J on 4 Aug 2023
selecting units can influence the condition number of the gradient
The condition number of the Hessian, I think you mean.

Sign in to comment.

Answers (1)

Matt J
Matt J on 4 Aug 2023
Edited: Matt J on 4 Aug 2023
You are free to translate as well as scale your optimization variables (or make any other nonlinear 1-1 transformation that might be useful).
For example, this quadratic objective is well-conditioned, wih condition number = 1, and doesn't require a change of units,
but has solutions at very large x and very small y. I'm not sure why you consider this a problem, but you could remedy it by making the change of variables , and rewriting the problem as,
  9 Comments
Bruno Luong
Bruno Luong on 5 Aug 2023
Edited: Bruno Luong on 5 Aug 2023
"I am not talking about the conditioning of the Hessian. I am talking about the conditioning of the gradient."
AFAIK condition number applied on matrix. It's defined as
There is no such thing as conditioning of the gradient which is a vector and NOT a matrix.
To describe your problem you must start to use correctly math terminology.
I do SVD to compute the gradient of the objective function, whose largest singular value to smallest singular value is defined as condition number"
I don't know what SVD you are talking about (what is the matrix, is it the Jacobian of the model (?)), but if that is the case please explain this process.
What you call conditioning might be something entirely different than what WE think (the matrix is the Hessian) and may be that's explain why you get confusing result.
Note that for non-linear case the Hessian is NOT J'*J, where J is the Jacobian of the model (at the considered point).
And the Jacobian change wrt the point. Do you take the Jacobian at the first guess? At the solution of the preceding optimisation? Something else?
"This condition number is dependant on selection of units."
Of course this we all know, but you did not explain:
  • when normalizing the unit and it converges faster; does the conditiong improve or degrade?
Also the conditiong is just a partial view of the whole picture. May be you have in your model somesort of null-space (space of the decision variables that is NOT observable by your data), or you have some constrained problem and active constraints and you need to evalute the conditioning of the Hessian projected on the tangent space (*), in this case the condition number of the full Hessian does NOT reflect the convergence rate.
Manything that can lead you to a wrong conclusion. If you are not able to show with a MWE the disussion is just vain.
At least show us the figure of normalization process, the problem dimension, the condition number you estimate - at the initial point and at the convergence point -, the number of iterations for convergence, the number of active constraints at the solution; etc...
I'll stop here, without more details the discussion is a waste of time.
(*) Acutually the curvature of the constraints also matter.
Frank
Frank on 5 Aug 2023
Thanks for your thought-provoking reply. Sorry I wasn't clear enough and made some wrong statements.
I am doing a tomography problem, which has the main physical parameter traveltime. When I said gradient, I meant the gradeint of each ray traveltime with regard to the velocity. Since there are m traveltimes and n velocity parameters, we have a gradient matrix. I did Singular Value Decomposition of this matrix and obtain its condition number.
What you said is NOT in vain to me, they are very helpful!!! Thank you very much!!!

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!