Documentation |
On this page… |
---|
Given a set of n nonlinear functions F_{i}(x), where n is the number of components of the vector x, the goal of equation solving is to find a vector x that makes all F_{i}(x) = 0.
fsolve attempts to solve systems of equations by minimizing the sum of squares of the components. If the sum of squares is zero, the system of equation is solved. fsolve has three algorithms:
Trust-region-reflective
Trust-region dogleg
Levenberg-Marquardt
All algorithms are large-scale; see Large-Scale vs. Medium-Scale Algorithms.
The fzero function solves a single one-dimensional equation.
The \ function solves systems of linear equations.
Many of the methods used in Optimization Toolbox™ solvers are based on trust regions, a simple yet powerful concept in optimization.
To understand the trust-region approach to optimization, consider the unconstrained minimization problem, minimize f(x), where the function takes vector arguments and returns scalars. Suppose you are at a point x in n-space and you want to improve, i.e., move to a point with a lower function value. The basic idea is to approximate f with a simpler function q, which reasonably reflects the behavior of function f in a neighborhood N around the point x. This neighborhood is the trust region. A trial step s is computed by minimizing (or approximately minimizing) over N. This is the trust-region subproblem,
$$\underset{s}{\mathrm{min}}\left\{q(s),\text{}s\in N\right\}.$$ | (11-1) |
The current point is updated to be x + s if f(x + s) < f(x); otherwise, the current point remains unchanged and N, the region of trust, is shrunk and the trial step computation is repeated.
The key questions in defining a specific trust-region approach to minimizing f(x) are how to choose and compute the approximation q (defined at the current point x), how to choose and modify the trust region N, and how accurately to solve the trust-region subproblem. This section focuses on the unconstrained problem. Later sections discuss additional complications due to the presence of constraints on the variables.
In the standard trust-region method ([48]), the quadratic approximation q is defined by the first two terms of the Taylor approximation to F at x; the neighborhood N is usually spherical or ellipsoidal in shape. Mathematically the trust-region subproblem is typically stated
$$\mathrm{min}\left\{\frac{1}{2}{s}^{T}Hs+{s}^{T}g\text{suchthat}\Vert Ds\Vert \le \Delta \right\},$$ | (11-2) |
where g is the gradient of f at the current point x, H is the Hessian matrix (the symmetric matrix of second derivatives), D is a diagonal scaling matrix, Δ is a positive scalar, and ∥ . ∥ is the 2-norm. Good algorithms exist for solving Equation 11-2 (see [48]); such algorithms typically involve the computation of a full eigensystem and a Newton process applied to the secular equation
$$\frac{1}{\Delta}-\frac{1}{\Vert s\Vert}=0.$$
Such algorithms provide an accurate solution to Equation 11-2. However, they require time proportional to several factorizations of H. Therefore, for trust-region problems a different approach is needed. Several approximation and heuristic strategies, based on Equation 11-2, have been proposed in the literature ([42] and [50]). The approximation approach followed in Optimization Toolbox solvers is to restrict the trust-region subproblem to a two-dimensional subspace S ([39] and [42]). Once the subspace S has been computed, the work to solve Equation 11-2 is trivial even if full eigenvalue/eigenvector information is needed (since in the subspace, the problem is only two-dimensional). The dominant work has now shifted to the determination of the subspace.
The two-dimensional subspace S is determined with the aid of a preconditioned conjugate gradient process described below. The solver defines S as the linear space spanned by s_{1} and s_{2}, where s_{1} is in the direction of the gradient g, and s_{2} is either an approximate Newton direction, i.e., a solution to
$$H\cdot {s}_{2}=-g,$$ | (11-3) |
or a direction of negative curvature,
$${s}_{2}^{T}\cdot H\cdot {s}_{2}<0.$$ | (11-4) |
The philosophy behind this choice of S is to force global convergence (via the steepest descent direction or negative curvature direction) and achieve fast local convergence (via the Newton step, when it exists).
A sketch of unconstrained minimization using trust-region ideas is now easy to give:
Formulate the two-dimensional trust-region subproblem.
Solve Equation 11-2 to determine the trial step s.
If f(x + s) < f(x), then x = x + s.
Adjust Δ.
These four steps are repeated until convergence. The trust-region dimension Δ is adjusted according to standard rules. In particular, it is decreased if the trial step is not accepted, i.e., f(x + s) ≥ f(x). See [46] and [49] for a discussion of this aspect.
Optimization Toolbox solvers treat a few important special cases of f with specialized functions: nonlinear least-squares, quadratic functions, and linear least-squares. However, the underlying algorithmic ideas are the same as for the general case. These special cases are discussed in later sections.
A popular way to solve large symmetric positive definite systems of linear equations Hp = –g is the method of Preconditioned Conjugate Gradients (PCG). This iterative approach requires the ability to calculate matrix-vector products of the form H·v where v is an arbitrary vector. The symmetric positive definite matrix M is a preconditioner for H. That is, M = C^{2}, where C^{–1}HC^{–1} is a well-conditioned matrix or a matrix with clustered eigenvalues.
In a minimization context, you can assume that the Hessian matrix H is symmetric. However, H is guaranteed to be positive definite only in the neighborhood of a strong minimizer. Algorithm PCG exits when a direction of negative (or zero) curvature is encountered, i.e., d^{T}Hd ≤ 0. The PCG output direction, p, is either a direction of negative curvature or an approximate (tol controls how approximate) solution to the Newton system Hp = –g. In either case p is used to help define the two-dimensional subspace used in the trust-region approach discussed in Trust-Region Methods for Nonlinear Minimization.
Another approach is to solve a linear system of equations to find the search direction, namely, Newton's method says to solve for the search direction d_{k} such that
J(x_{k})d_{k} =
–F(x_{k})
x_{k + 1} = x_{k} + d_{k},
where J(x_{k}) is the n-by-n Jacobian
$$J\left({x}_{k}\right)=\left[\begin{array}{c}\nabla {F}_{1}{\left({x}_{k}\right)}^{T}\\ \nabla {F}_{2}{\left({x}_{k}\right)}^{T}\\ \vdots \\ \nabla {F}_{n}{\left({x}_{k}\right)}^{T}\end{array}\right].$$
Newton's method can run into difficulties. J(x_{k}) may be singular, and so the Newton step d_{k} is not even defined. Also, the exact Newton step d_{k} may be expensive to compute. In addition, Newton's method may not converge if the starting point is far from the solution.
Using trust-region techniques (introduced in Trust-Region Methods for Nonlinear Minimization) improves robustness when starting far from the solution and handles the case when J(x_{k}) is singular. To use a trust-region strategy, a merit function is needed to decide if x_{k + 1} is better or worse than x_{k}. A possible choice is
$$\underset{d}{\mathrm{min}}f(d)=\frac{1}{2}F{\left({x}_{k}+d\right)}^{T}F\left({x}_{k}+d\right).$$
But a minimum of f(d) is not necessarily a root of F(x).
The Newton step d_{k} is a root of
M(x_{k} + d) = F(x_{k}) + J(x_{k})d,
and so it is also a minimum of m(d), where
$$\begin{array}{c}\underset{d}{\mathrm{min}}m(d)=\frac{1}{2}{\Vert M\left({x}_{k}+d\right)\Vert}_{2}^{2}=\frac{1}{2}{\Vert F\left({x}_{k}\right)+J\left({x}_{k}\right)d\Vert}_{2}^{2}\\ =\frac{1}{2}F{\left({x}_{k}\right)}^{T}F\left({x}_{k}\right)+{d}^{T}J{\left({x}_{k}\right)}^{T}F\left({x}_{k}\right)+\frac{1}{2}{d}^{T}J{\left({x}_{k}\right)}^{T}J\left({x}_{k}\right)d.\end{array}$$ | (11-5) |
Then m(d) is a better choice of merit function than f(d), and so the trust-region subproblem is
$$\underset{d}{\mathrm{min}}\left[\frac{1}{2}F{\left({x}_{k}\right)}^{T}F\left({x}_{k}\right)+{d}^{T}J{\left({x}_{k}\right)}^{T}F\left({x}_{k}\right)+\frac{1}{2}{d}^{T}J{\left({x}_{k}\right)}^{T}J\left({x}_{k}\right)d\right],$$ | (11-6) |
such that ∥D·d∥ ≤ Δ. This subproblem can be efficiently solved using a dogleg strategy.
For an overview of trust-region methods, see Conn [4], and Nocedal [31].
The key feature of this algorithm is the use of the Powell dogleg procedure for computing the step d, which minimizes Equation 11-6. For a detailed description, see Powell [34].
The step d is constructed from a convex combination of a Cauchy step (a step along the steepest descent direction) and a Gauss-Newton step for f(x). The Cauchy step is calculated as
d_{C} = –αJ(x_{k})^{T}F(x_{k}),
where α is chosen to minimize Equation 11-5.
The Gauss-Newton step is calculated by solving
J(x_{k})·d_{GN} = –F(x_{k}),
using the MATLAB^{®} \ (matrix left division) operator.
The step d is chosen so that
d = d_{C} + λ(d_{GN} – d_{C}),
where λ is the largest value in the interval [0,1] such that ∥d∥ ≤ Δ. If J_{k} is (nearly) singular, d is just the Cauchy direction.
The dogleg algorithm is efficient since it requires only one linear solve per iteration (for the computation of the Gauss-Newton step). Additionally, it can be more robust than using the Gauss-Newton method with a line search.
The Levenberg-Marquardt [25], and [27] method uses a search direction that is a solution of the linear set of equations
$$\left(J{\left({x}_{k}\right)}^{T}J\left({x}_{k}\right)+{\lambda}_{k}I\right){d}_{k}=-J{\left({x}_{k}\right)}^{T}F\left({x}_{k}\right),$$ | (11-7) |
or, optionally, of the equations
$$\left(J{\left({x}_{k}\right)}^{T}J\left({x}_{k}\right)+{\lambda}_{k}diag\left(J{\left({x}_{k}\right)}^{T}J\left({x}_{k}\right)\right)\right){d}_{k}=-J{\left({x}_{k}\right)}^{T}F\left({x}_{k}\right),$$ | (11-8) |
where the scalar λ_{k} controls both the magnitude and direction of d_{k}. Set option ScaleProblem to 'none' to choose Equation 11-7, and set ScaleProblem to 'Jacobian' to choose Equation 11-8.
When λ_{k} is zero, the direction d_{k} is the Gauss-Newton method. As λ_{k} tends to infinity, d_{k} tends towards the steepest descent direction, with magnitude tending to zero. This implies that for some sufficiently large λ_{k}, the term F(x_{k} + d_{k}) < F(x_{k}) holds true. The term λ_{k} can therefore be controlled to ensure descent even when second-order terms, which restrict the efficiency of the Gauss-Newton method, are encountered. The Levenberg-Marquardt method therefore uses a search direction that is a cross between the Gauss-Newton direction and the steepest descent direction.
fzero attempts to find the root of a scalar function f of a scalar variable x.
fzero looks for an interval around an initial point such that f(x) changes sign. If you give an initial interval instead of an initial point, fzero checks to make sure f(x) has different signs at the endpoints of the interval. The initial interval must be finite; it cannot contain ±Inf.
fzero uses a combination of interval bisection, linear interpolation, and inverse quadratic interpolation in order to locate a root of f(x). See fzero for more information.