## Quadratic Programming Algorithms

### Quadratic Programming Definition

Quadratic programming is the problem of finding a vector *x* that
minimizes a quadratic function, possibly subject to linear constraints:

$$\underset{x}{\mathrm{min}}\frac{1}{2}{x}^{T}Hx+{c}^{T}x$$ | (1) |

such that *A·x* ≤ *b*, *Aeq·x* = *beq*, *l* ≤ *x* ≤ *u*.

`interior-point-convex`

`quadprog`

Algorithm

The `interior-point-convex`

algorithm performs the
following steps:

**Note**

The algorithm has two code paths. It takes one when the Hessian
matrix *H* is an ordinary (full) matrix of doubles,
and it takes the other when *H* is a sparse matrix.
For details of the sparse data type, see Sparse Matrices.
Generally, the algorithm is faster for large problems that have relatively
few nonzero terms when you specify *H* as `sparse`

. Similarly, the algorithm is faster
for small or relatively dense problems when you specify *H* as `full`

.

#### Presolve/Postsolve

The algorithm first tries to simplify the problem by removing redundancies and simplifying constraints. The tasks performed during the presolve step can include the following:

Check if any variables have equal upper and lower bounds. If so, check for feasibility, and then fix and remove the variables.

Check if any linear inequality constraint involves only one variable. If so, check for feasibility, and then change the linear constraint to a bound.

Check if any linear equality constraint involves only one variable. If so, check for feasibility, and then fix and remove the variable.

Check if any linear constraint matrix has zero rows. If so, check for feasibility, and then delete the rows.

Determine if the bounds and linear constraints are consistent.

Check if any variables appear only as linear terms in the objective function and do not appear in any linear constraint. If so, check for feasibility and boundedness, and then fix the variables at their appropriate bounds.

Change any linear inequality constraints to linear equality constraints by adding slack variables.

If the algorithm detects an infeasible or unbounded problem, it halts and issues an appropriate exit message.

The algorithm might arrive at a single feasible point, which represents the solution.

If the algorithm does not detect an infeasible or unbounded problem in the presolve step, and if the presolve has not produced the solution, the algorithm continues to its next steps. After reaching a stopping criterion, the algorithm reconstructs the original problem, undoing any presolve transformations. This final step is the postsolve step.

For details, see Gould and Toint [63].

#### Generate Initial Point

The initial point `x0`

for the algorithm is:

Initialize

`x0`

to`ones(n,1)`

, where`n`

is the number of rows in*H*.For components that have both an upper bound

`ub`

and a lower bound`lb`

, if a component of`x0`

is not strictly inside the bounds, the component is set to`(ub + lb)/2`

.For components that have only one bound, modify the component if necessary to lie strictly inside the bound.

Take a predictor step (see Predictor-Corrector), with minor corrections for feasibility, not a full predictor-corrector step. This places the initial point closer to the

*central path*without entailing the overhead of a full predictor-corrector step. For details of the central path, see Nocedal and Wright [7], page 397.

#### Predictor-Corrector

The sparse and full interior-point-convex algorithms differ mainly in the predictor-corrector phase. The algorithms are similar, but differ in some details. For the basic algorithm description, see Mehrotra [47].

The algorithms begin by turning the linear inequalities Ax <= b into inequalities of the form Ax >= b by multiplying A and b by -1. This has no bearing on the solution, but makes the problem of the same form found in some literature.

**Sparse Predictor-Corrector. **Similar to the `fmincon`

interior-point algorithm,
the sparse `interior-point-convex`

algorithm tries to find
a point where the Karush-Kuhn-Tucker (KKT) conditions
hold. For the quadratic programming problem described in Quadratic Programming Definition, these conditions are:

$$\begin{array}{c}Hx+c-{A}_{eq}^{T}y-{\overline{A}}^{T}z=0\\ \overline{A}x-\overline{b}-s=0\\ {A}_{eq}x-{b}_{eq}=0\\ {s}_{i}{z}_{i}=0,\text{}i=1,2,\mathrm{...},m\\ s\ge 0\text{\hspace{0.17em}}\\ z\ge 0.\end{array}$$

Here

$$\overline{A}$$ is the extended linear inequality matrix that includes bounds written as linear inequalities. $$\overline{b}$$ is the corresponding linear inequality vector, including bounds.

*s*is the vector of slacks that convert inequality constraints to equalities.*s*has length*m*, the number of linear inequalities and bounds.*z*is the vector of Lagrange multipliers corresponding to*s*.*y*is the vector of Lagrange multipliers associated with the equality constraints.

The algorithm first predicts a step from the Newton-Raphson
formula, then computes a corrector step. The corrector attempts to
better enforce the nonlinear constraint *s _{i}z_{i}* = 0.

Definitions for the predictor step:

*r*, the dual residual:_{d}$${r}_{d}=Hx+c-{A}_{eq}^{T}y-{\overline{A}}^{T}z.$$

*r*, the primal equality constraint residual:_{eq}$${r}_{eq}={A}_{eq}x-{b}_{eq}.$$

*r*, the primal inequality constraint residual, which includes bounds and slacks:_{ineq}$${r}_{ineq}=\overline{A}x-\overline{b}-s.$$

*r*, the complementarity residual:_{sz}*r*=_{sz}*Sz*.*S*is the diagonal matrix of slack terms,*z*is the column matrix of Lagrange multipliers.*r*, the average complementarity:_{c}$${r}_{c}=\frac{{s}^{T}z}{m}.$$

In a Newton step, the changes in *x*, *s*, *y*,
and *z*, are given by:

$$\left(\begin{array}{cccc}H& 0& -{A}_{eq}^{T}& -{\overline{A}}^{T}\\ {A}_{eq}& 0& 0& 0\\ \overline{A}& -I& 0& 0\\ 0& Z& 0& S\end{array}\right)\left(\begin{array}{c}\Delta x\\ \Delta s\\ \Delta y\\ \Delta z\end{array}\right)=-\left(\begin{array}{c}{r}_{d}\\ {r}_{eq}\\ {r}_{ineq}\\ {r}_{sz}\end{array}\right).$$ | (2) |

However, a full Newton step might be infeasible, because of
the positivity constraints on *s* and *z*.
Therefore, `quadprog`

shortens the step, if necessary,
to maintain positivity.

Additionally, to maintain a “centered” position
in the interior, instead of trying to solve *s _{i}z_{i}* = 0, the algorithm
takes a positive parameter

*σ*, and tries to solve

*s _{i}z_{i}* =

*σr*.

_{c}`quadprog`

replaces *r _{sz}* in
the Newton step equation with

*r*+ Δ

_{sz}*s*Δ

*z*–

*σr*

_{c}**1**, where

**1**is the vector of ones. Also,

`quadprog`

reorders the
Newton equations to obtain a symmetric, more numerically stable system
for the predictor step calculation.After calculating the corrected Newton step, the algorithm performs more calculations to get both a longer current step, and to prepare for better subsequent steps. These multiple correction calculations can improve both performance and robustness. For details, see Gondzio [4].

**Full Predictor-Corrector. **The full predictor-corrector algorithm does not combine bounds
into linear constraints, so it has another set of slack variables
corresponding to the bounds. The algorithm shifts lower bounds to
zero. And, if there is only one bound on a variable, the algorithm
turns it into a lower bound of zero, by negating the inequality of
an upper bound.

$$\overline{A}$$ is the extended linear matrix
that includes both linear inequalities and linear equalities. $$\overline{b}$$ is the corresponding linear
equality vector. $$\overline{A}$$ also includes
terms for extending the vector *x* with slack variables *s* that
turn inequality constraints to equality constraints:

$$\overline{A}x=\left(\begin{array}{cc}{A}_{eq}& 0\\ A& I\end{array}\right)\left(\begin{array}{c}{x}_{0}\\ s\end{array}\right),$$

where *x*_{0} means the
original *x* vector.

The KKT conditions are

$$\begin{array}{c}Hx+c-{\overline{A}}^{T}y-v+w=0\\ \overline{A}x=\overline{b}\\ x+t=u\\ {v}_{i}{x}_{i}=0,\text{}i=1,2,\mathrm{...},m\\ {w}_{i}{t}_{i}=0,\text{}i=1,2,\mathrm{...},n\\ x,v,w,t\ge 0.\end{array}$$ | (3) |

To find the solution *x*, slack variables and
dual variables to Equation 3,
the algorithm basically considers a Newton-Raphson step:

$$\left(\begin{array}{ccccc}H& -{\overline{A}}^{T}& 0& -I& I\\ \overline{A}& 0& 0& 0& 0\\ -I& 0& -I& 0& 0\\ V& 0& 0& X& 0\\ 0& 0& W& 0& T\end{array}\right)\left(\begin{array}{c}\Delta x\\ \Delta y\\ \Delta t\\ \begin{array}{l}\Delta v\\ \Delta w\end{array}\end{array}\right)=-\left(\begin{array}{c}Hx+c-{\overline{A}}^{T}y-v+w\\ \overline{A}x-\overline{b}\\ u-x-t\\ \begin{array}{l}VX\\ WT\end{array}\end{array}\right)=-\left(\begin{array}{c}{r}_{d}\\ {r}_{p}\\ {r}_{ub}\\ \begin{array}{l}{r}_{vx}\\ {r}_{wt}\end{array}\end{array}\right),$$ | (4) |

where *X*, *V*, *W*,
and *T* are diagonal matrices corresponding to the
vectors *x*, *v*, *w*,
and *t* respectively. The residual vectors on the
far right side of the equation are:

*r*, the dual residual_{d}*r*, the primal residual_{p}*r*, the upper bound residual_{ub}*r*, the lower bound complementarity residual_{vx}*r*, the upper bound complementarity residual_{wt}

The algorithm solves Equation 4 by first converting it to the symmetric matrix form

$$\left(\begin{array}{cc}-D& {\overline{A}}^{T}\\ \overline{A}& 0\end{array}\right)\left(\begin{array}{c}\Delta x\\ \Delta y\end{array}\right)=-\left(\begin{array}{c}R\\ {r}_{p}\end{array}\right),$$ | (5) |

where

$$\begin{array}{l}D=H+{X}^{-1}V+{T}^{-1}W\\ R=-{r}_{d}-{X}^{-1}{r}_{vx}+{T}^{-1}{r}_{wt}+{T}^{-1}W{r}_{ub}.\end{array}$$

All the matrix inverses in the definitions of *D* and *R* are
simple to compute because the matrices are diagonal.

To derive Equation 5 from Equation 4, notice that the
second row of Equation 5 is
the same as the second matrix row of Equation 4. The first row of Equation 5 comes from solving
the last two rows of Equation 4 for Δ*v* and
Δ*w*, and then solving for Δ*t*.

To solve Equation 5, the algorithm follows the essential elements of Altman and Gondzio [1]. The algorithm solves the symmetric system by an LDL decomposition. As pointed out by authors such as Vanderbei and Carpenter [2], this decomposition is numerically stable without any pivoting, so can be fast.

After calculating the corrected Newton step, the algorithm performs more calculations to get both a longer current step, and to prepare for better subsequent steps. These multiple correction calculations can improve both performance and robustness. For details, see Gondzio [4].

The full `quadprog`

predictor-corrector algorithm
is largely the same as that in the `linprog`

`'interior-point'`

algorithm,
but includes quadratic terms as well. See Predictor-Corrector.

## References

[1] Altman, Anna and J. Gondzio. *Regularized
symmetric indefinite systems in interior point methods for linear
and quadratic optimization*. Optimization Methods and Software,
1999. Available
for download here.

[2] Vanderbei, R. J. and T. J. Carpenter. *Symmetric indefinite
systems for interior point methods*. Mathematical
Programming 58, 1993. pp. 1–32. Available for download here.

#### Stopping Conditions

The predictor-corrector algorithm iterates until it reaches a point that is feasible (satisfies the constraints to within tolerances) and where the relative step sizes are small. Specifically, define

$$\rho =\mathrm{max}\left(1,\Vert H\Vert ,\Vert \overline{A}\Vert ,\Vert {A}_{eq}\Vert ,\Vert c\Vert ,\Vert \overline{b}\Vert ,\Vert {b}_{eq}\Vert \right)$$

The algorithm stops when all of these conditions are satisfied:

$$\begin{array}{c}{\Vert {r}_{p}\Vert}_{1}+{\Vert {r}_{ub}\Vert}_{1}\le \rho \text{TolCon}\\ {\Vert {r}_{d}\Vert}_{\infty}\le \rho \text{TolFun}\\ {r}_{c}\le \text{TolFun,}\end{array}$$

where

$${r}_{c}=\underset{i}{\mathrm{max}}\left(\mathrm{min}\left(\left|{x}_{i}{v}_{i}\right|,\left|{x}_{i}\right|,\left|{v}_{i}\right|\right),\mathrm{min}\left(\left|{t}_{i}{w}_{i}\right|,\left|{t}_{i}\right|,\left|{w}_{i}\right|\right)\right).$$

*r _{c}* essentially measures
the size of the complementarity residuals

*xv*and

*tw*, which are each vectors of zeros at a solution.

#### Infeasibility Detection

`quadprog`

calculates a *merit function* *φ* at
every iteration. The merit function is a measure of feasibility. `quadprog`

stops
if the merit function grows too large. In this case, `quadprog`

declares
the problem to be infeasible.

The merit function is related to the KKT conditions for the problem—see Predictor-Corrector. Use the following definitions:

$$\begin{array}{c}\rho =\mathrm{max}\left(1,\Vert H\Vert ,\Vert \overline{A}\Vert ,\Vert {A}_{eq}\Vert ,\Vert c\Vert ,\Vert \overline{b}\Vert ,\Vert {b}_{eq}\Vert \right)\\ {r}_{\text{eq}}={A}_{\text{eq}}x-{b}_{\text{eq}}\\ {r}_{\text{ineq}}=\overline{A}x-\overline{b}-s\\ {r}_{\text{d}}=Hx+c+{A}_{\text{eq}}^{T}{\lambda}_{\text{eq}}+{\overline{A}}^{T}{\overline{\lambda}}_{\text{ineq}}\\ g=\frac{1}{2}{x}^{T}Hx+{c}^{T}x-{\overline{b}}^{T}{\overline{\lambda}}_{\text{ineq}}-{b}_{\text{eq}}^{T}{\lambda}_{\text{eq}}.\end{array}$$

The notation $$\overline{A}$$ and $$\overline{b}$$ means the linear inequality
coefficients, augmented with terms to represent bounds for the sparse
algorithm. The notation $${\overline{\lambda}}_{\text{ineq}}$$ similarly represents
Lagrange multipliers for the linear inequality constraints, including
bound constraints. This was called *z* in Predictor-Corrector, and $${\lambda}_{\text{eq}}$$ was called *y*.

The merit function *φ* is

$$\frac{1}{\rho}\left(\mathrm{max}\left({\Vert {r}_{\text{eq}}\Vert}_{\infty},{\Vert {r}_{\text{ineq}}\Vert}_{\infty},{\Vert {r}_{\text{d}}\Vert}_{\infty}\right)+g\right).$$

If this merit function becomes too large, `quadprog`

declares
the problem to be infeasible and halts with exit flag `-2`

.

`trust-region-reflective`

`quadprog`

Algorithm

Many of the methods used in Optimization Toolbox™ solvers
are based on *trust regions,* a simple yet powerful
concept in optimization.

To understand the trust-region approach to optimization, consider
the unconstrained minimization problem, minimize *f*(*x*),
where the function takes vector arguments and returns scalars. Suppose
you are at a point *x* in *n*-space
and you want to improve, i.e., move to a point with a lower function
value. The basic idea is to approximate *f* with
a simpler function *q*, which reasonably reflects
the behavior of function *f* in a neighborhood *N* around
the point *x*. This neighborhood is the trust region.
A trial step *s* is computed by minimizing (or approximately
minimizing) over *N*. This is the trust-region subproblem,

$$\underset{s}{\mathrm{min}}\left\{q(s),\text{}s\in N\right\}.$$ | (6) |

The current point is updated to be *x* + *s* if *f*(*x* + *s*) < *f*(*x*);
otherwise, the current point remains unchanged and *N*,
the region of trust, is shrunk and the trial step computation is repeated.

The key questions in defining a specific trust-region approach
to minimizing *f*(*x*) are how to
choose and compute the approximation *q* (defined
at the current point *x*), how to choose and modify
the trust region *N*, and how accurately to solve
the trust-region subproblem. This section focuses on the unconstrained
problem. Later sections discuss additional complications due to the
presence of constraints on the variables.

In the standard trust-region method ([48]), the quadratic approximation *q* is
defined by the first two terms of the Taylor approximation to *F* at *x*;
the neighborhood *N* is usually spherical or ellipsoidal
in shape. Mathematically the trust-region subproblem is typically
stated

$$\mathrm{min}\left\{\frac{1}{2}{s}^{T}Hs+{s}^{T}g\text{suchthat}\Vert Ds\Vert \le \Delta \right\},$$ | (7) |

where *g* is the gradient of *f* at the current point
*x*, *H* is the Hessian matrix (the symmetric
matrix of second derivatives), *D* is a diagonal scaling matrix, Δ
is a positive scalar, and ‖ . ‖ is the 2-norm. Good algorithms exist for solving
Equation 7 (see [48]); such algorithms typically involve the computation of all eigenvalues of
*H* and a Newton process applied to the secular equation

$$\frac{1}{\Delta}-\frac{1}{\Vert s\Vert}=0.$$

Such algorithms provide an accurate solution to Equation 7. However, they
require time proportional to several factorizations of *H*.
Therefore, for large-scale problems a different approach is needed.
Several approximation and heuristic strategies, based on Equation 7, have been proposed
in the literature ([42] and [50]). The approximation approach followed
in Optimization Toolbox solvers is to restrict the trust-region
subproblem to a two-dimensional subspace *S* ([39] and [42]).
Once the subspace *S* has been computed, the work
to solve Equation 7 is
trivial even if full eigenvalue/eigenvector information is needed
(since in the subspace, the problem is only two-dimensional). The
dominant work has now shifted to the determination of the subspace.

The two-dimensional subspace *S* is
determined with the aid of a preconditioned
conjugate gradient process described below. The solver defines *S* as
the linear space spanned by *s*_{1} and *s*_{2},
where *s*_{1} is in the direction
of the gradient *g*, and *s*_{2} is
either an approximate Newton direction, i.e.,
a solution to

$$H\cdot {s}_{2}=-g,$$ | (8) |

or a direction of negative curvature,

$${s}_{2}^{T}\cdot H\cdot {s}_{2}<0.$$ | (9) |

The philosophy behind this choice of *S* is
to force global convergence (via the steepest descent direction or
negative curvature direction) and achieve fast local convergence (via
the Newton step, when it exists).

A sketch of unconstrained minimization using trust-region ideas is now easy to give:

Formulate the two-dimensional trust-region subproblem.

Solve Equation 7 to determine the trial step

*s*.If

*f*(*x*+*s*) <*f*(*x*), then*x*=*x*+*s*.Adjust Δ.

These four steps are repeated until convergence. The trust-region
dimension Δ is adjusted according to standard rules. In particular,
it is decreased if the trial step is not accepted, i.e., *f*(*x* + *s*)
≥ *f*(*x*).
See [46] and [49] for a discussion of this aspect.

Optimization Toolbox solvers treat a few important special
cases of *f* with specialized functions: nonlinear
least-squares, quadratic functions, and linear least-squares. However,
the underlying algorithmic ideas are the same as for the general case.
These special cases are discussed in later sections.

The subspace trust-region method is used to determine a search direction. However, instead of restricting the step to (possibly) one reflection step, as in the nonlinear minimization case, a piecewise reflective line search is conducted at each iteration. See [45] for details of the line search.

#### Preconditioned Conjugate Gradient Method

A popular way to solve large, symmetric, positive definite systems of linear equations *Hp* = –*g* is the method of Preconditioned Conjugate Gradients (PCG). This iterative
approach requires the ability to calculate matrix-vector products of the form
*H·v* where *v* is an arbitrary vector. The
symmetric positive definite matrix *M *is a*
preconditioner* for *H*. That is, *M* = *C*^{2}, where *C*^{–1}*HC*^{–1} is a well-conditioned matrix or a matrix with clustered eigenvalues.

In a minimization context, you can assume that the Hessian matrix *H* is
symmetric. However, *H* is guaranteed to be positive definite only in the
neighborhood of a strong minimizer. Algorithm PCG exits when it encounters a direction of
negative (or zero) curvature, that is, *d ^{T}Hd* ≤ 0. The PCG output direction

*p*is either a direction of negative curvature or an approximate solution to the Newton system

*Hp*= –

*g*. In either case,

*p*helps to define the two-dimensional subspace used in the trust-region approach discussed in Trust-Region Methods for Nonlinear Minimization.

#### Linear Equality Constraints

Linear constraints complicate the situation described for unconstrained minimization. However, the underlying ideas described previously can be carried through in a clean and efficient way. The trust-region methods in Optimization Toolbox solvers generate strictly feasible iterates.

The general linear equality constrained minimization problem can be written

$$\mathrm{min}\left\{f(x)\text{suchthat}Ax=b\right\},$$ | (10) |

where *A* is an *m*-by-*n* matrix
(*m* ≤ *n*). Some Optimization Toolbox solvers
preprocess *A* to remove strict linear dependencies
using a technique based on the LU factorization of *A ^{T}* [46]. Here

*A*is assumed to be of rank

*m*.

The method used to solve Equation 10 differs from the unconstrained approach
in two significant ways. First, an initial feasible point *x*_{0} is
computed, using a sparse least-squares step, so that *Ax*_{0} = *b*.
Second, Algorithm PCG is replaced with Reduced Preconditioned Conjugate
Gradients (RPCG), see [46], in order
to compute an approximate reduced Newton step (or a direction of negative
curvature in the null space of *A*). The key linear
algebra step involves solving systems of the form

$$\left[\begin{array}{cc}C& {\tilde{A}}^{T}\\ \tilde{A}& 0\end{array}\right]\left[\begin{array}{c}s\\ t\end{array}\right]=\left[\begin{array}{c}r\\ 0\end{array}\right],$$ | (11) |

where $$\tilde{A}$$ approximates *A* (small
nonzeros of *A* are set to zero provided rank is
not lost) and *C* is a sparse symmetric positive-definite
approximation to *H*, i.e., *C* = *H*.
See [46] for more details.

#### Box Constraints

The box constrained problem is of the form

$$\mathrm{min}\left\{f(x)\text{suchthat}l\le x\le u\right\},$$ | (12) |

where *l* is a vector of lower bounds, and *u* is
a vector of upper bounds. Some (or all) of the components of *l* can
be equal to –∞ and some (or all) of the components of *u* can
be equal to ∞. The method generates a sequence of strictly
feasible points. Two techniques are used to maintain feasibility while
achieving robust convergence behavior. First, a scaled modified Newton
step replaces the unconstrained Newton step (to define the two-dimensional
subspace *S*). Second, reflections
are used to increase the step size.

The scaled modified Newton step arises from examining the Kuhn-Tucker necessary conditions for Equation 12,

$${\left(D(x)\right)}^{-2}g=0,$$ | (13) |

where

$$D(x)=\text{diag}\left({\left|{v}_{k}\right|}^{-1/2}\right),$$

and the vector *v*(*x*) is
defined below, for each 1 ≤ *i* ≤ *n*:

If

*g*< 0 and_{i}*u*< ∞ then_{i}*v*=_{i}*x*–_{i}*u*_{i}If

*g*≥ 0 and_{i}*l*> –∞ then_{i}*v*=_{i}*x*–_{i}*l*_{i}If

*g*< 0 and_{i}*u*= ∞ then_{i}*v*= –1_{i}If

*g*≥ 0 and_{i}*l*= –∞ then_{i}*v*= 1_{i}

The nonlinear system Equation 13 is not differentiable everywhere.
Nondifferentiability occurs when *v _{i}* = 0. You can avoid
such points by maintaining strict feasibility, i.e., restricting

*l*<

*x*<

*u*.

The scaled modified Newton step *s _{k}* for
the nonlinear system of equations given by Equation 13 is defined as the solution to the
linear system

$$\widehat{M}D{s}^{N}=-\widehat{g}$$ | (14) |

at the *k*th iteration, where

$$\widehat{g}={D}^{-1}g=\text{diag}\left({\left|v\right|}^{1/2}\right)g,$$ | (15) |

and

$$\widehat{M}={D}^{-1}H{D}^{-1}+\text{diag}(g){J}^{v}.$$ | (16) |

Here *J ^{v}* plays
the role of the Jacobian of |

*v*|. Each diagonal component of the diagonal matrix

*J*equals 0, –1, or 1. If all the components of

^{v}*l*and

*u*are finite,

*J*= diag(sign(

^{v}*g*)). At a point where

*g*= 0,

_{i}*v*might not be differentiable. $${J}_{ii}^{v}=0$$ is defined at such a point. Nondifferentiability of this type is not a cause for concern because, for such a component, it is not significant which value

_{i}*v*takes. Further, |

_{i}*v*| will still be discontinuous at this point, but the function |

_{i}*v*|·

_{i}*g*is continuous.

_{i}Second, reflections are used to increase
the step size. A (single) reflection step is defined as follows. Given
a step *p* that intersects a bound constraint, consider
the first bound constraint crossed by *p*; assume
it is the *i*th bound constraint (either the *i*th
upper or *i*th lower bound). Then the reflection
step *p ^{R}* =

*p*except in the

*i*th component, where

*p*= –

^{R}_{i}*p*.

_{i}`active-set`

`quadprog`

Algorithm

After completing a presolve step, the `active-set`

algorithm
proceeds in two phases.

Phase 1 — Obtain a feasible point with respect to all constraints.

Phase 2 — Iteratively lower the objective function while maintaining a list of the active constraints and maintaining feasibility in each iteration.

The `active-set`

strategy (also known as a projection method) is
similar to the strategy of Gill et al., described in [18] and [17].

#### Presolve Step

The algorithm first tries to simplify the problem by removing redundancies and simplifying constraints. The tasks performed during the presolve step can include the following:

Check if any variables have equal upper and lower bounds. If so, check for feasibility, and then fix and remove the variables.

Check if any linear inequality constraint involves only one variable. If so, check for feasibility, and then change the linear constraint to a bound.

Check if any linear equality constraint involves only one variable. If so, check for feasibility, and then fix and remove the variable.

Check if any linear constraint matrix has zero rows. If so, check for feasibility, and then delete the rows.

Determine if the bounds and linear constraints are consistent.

Check if any variables appear only as linear terms in the objective function and do not appear in any linear constraint. If so, check for feasibility and boundedness, and then fix the variables at their appropriate bounds.

Change any linear inequality constraints to linear equality constraints by adding slack variables.

If the algorithm detects an infeasible or unbounded problem, it halts and issues an appropriate exit message.

The algorithm might arrive at a single feasible point, which represents the solution.

If the algorithm does not detect an infeasible or unbounded problem in the presolve step, and if the presolve has not produced the solution, the algorithm continues to its next steps. After reaching a stopping criterion, the algorithm reconstructs the original problem, undoing any presolve transformations. This final step is the postsolve step.

For details, see Gould and Toint [63].

#### Phase 1 Algorithm

In Phase 1, the algorithm attempts to find a point `x`

that
satisfies all constraints, with no consideration of the objective function.
`quadprog`

runs the Phase 1 algorithm only if the
supplied initial point `x0`

is infeasible.

To begin, the algorithm tries to find a point that is feasible with respect to
all equality constraints, such as `X = Aeq\beq`

. If there is no
solution `x`

to the equations `Aeq*x = beq`

,
then the algorithm halts. If there is a solution `X`

, the next
step is to satisfy the bounds and linear inequalities. In the case of no
equality constraints set `X = x0`

, the initial point.

Starting from `X`

, the algorithm calculates ```
M =
max(A*X – b, X - ub, lb – X)
```

. If ```
M <=
options.ConstraintTolerance
```

, then the point `X`

is feasible and the Phase 1 algorithm halts.

If `M > options.ConstraintTolerance`

, the algorithm
introduces a nonnegative slack variable *γ* for the auxiliary
linear programming problem

$$\underset{x,\gamma}{\mathrm{min}}\gamma $$

such that

$$\begin{array}{c}Ax-\gamma \le b\\ \text{Aeq}\text{\hspace{0.17em}}x=\text{beq}\\ \text{lb}-\gamma \le x\le \text{ub}+\gamma \\ \gamma \ge -\rho .\end{array}$$

Here, *ρ* is the `ConstraintTolerance`

option multiplied by the absolute value of the largest element in
`A`

and `Aeq`

. If the algorithm finds
*γ* = 0 and a point *x* that satisfies the
equations and inequalities, then *x* is a feasible Phase 1
point. If there is no solution to the auxiliary linear programming problem
*x* with *γ* = 0, then the Phase 1 problem
is infeasible.

To solve the auxiliary linear programming problem, the algorithm sets
*γ*_{0} = M + 1, sets
*x*_{0} = `X`

, and
initializes the active set as the fixed variables (if any) and all the equality
constraints. The algorithm reformulates the linear programming variables
*p* to be the offset of *x* from the
current point *x*_{0}, namely *x* =
*x*_{0} +
*p*. The algorithm solves the linear programming problem by the
same iterations as it takes in Phase 2 to solve the quadratic programming
problem, with an appropriately modified Hessian.

#### Phase 2 Algorithm

In terms of a variable *d*, the problem is

$$\begin{array}{c}\underset{d\in {\Re}^{n}}{\mathrm{min}}q(d)=\frac{1}{2}{d}^{T}Hd+{c}^{T}d,\\ {A}_{i}d={b}_{i},\text{}i=1,\mathrm{...},{m}_{e}\\ {A}_{i}d\le {b}_{i},\text{}i={m}_{e}+1,\mathrm{...},m.\end{array}$$ | (17) |

Here, *A _{i}* refers to the

`i`

th row of the
*m*-by-

*n*matrix

*A*.

During Phase 2, an active set $${\overline{A}}_{k}$$, which is an estimate of the active constraints (those on the constraint boundaries) at the solution point.

The algorithm updates $${\overline{A}}_{k}$$ at each iteration *k*, forming the basis for
a search direction *d _{k}*. Equality
constraints always remain in the active set $${\overline{A}}_{k}$$. The search direction

*d*is calculated and minimizes the objective function while remaining on any active constraint boundaries. The algorithm forms the feasible subspace for

_{k}*d*from a basis

_{k}*Z*whose columns are orthogonal to the estimate of the active set $${\overline{A}}_{k}$$ (that is, $${\overline{A}}_{k}{Z}_{k}=0$$). Therefore, a search direction, which is formed from a linear summation of any combination of the columns of

_{k}*Z*, is guaranteed to remain on the boundaries of the active constraints.

_{k}The algorithm forms the matrix *Z _{k}*
from the last

*n*–

*l*columns of the QR decomposition of the matrix $${\overline{A}}_{k}^{T}$$, where

*l*is the number of active constraints and

*l < n*. That is,

*Z*is given by

_{k}$${Z}_{k}=Q\left[:,l+1:n\right],$$ | (18) |

where

$${Q}^{T}{\overline{A}}_{k}^{T}=\left[\begin{array}{c}R\\ 0\end{array}\right].$$

After finding *Z _{k}*, the algorithm
looks for a new search direction

*d*that minimizes

_{k}*q*(

*d*), where

*d*is in the null space of the active constraints. That is,

_{k}*d*is a linear combination of the columns of

_{k}*Z*: $${\widehat{d}}_{k}={Z}_{k}p$$ for some vector

_{k}*p*.

Viewing the quadratic as a function of *p* by substituting
for *d _{k}*, gives

$$q(p)=\frac{1}{2}{p}^{T}{Z}_{k}^{T}H{Z}_{k}p+{c}^{T}{Z}_{k}p.$$ | (19) |

Differentiating this equation with respect to *p*
yields

$$\nabla q(p)={Z}_{k}^{T}H{Z}_{k}p+{Z}_{k}^{T}c.$$ | (20) |

∇*q*(*p*) is referred to as the projected gradient of the quadratic
function because it is the gradient projected in the subspace defined by
*Z _{k}*. The term $${Z}_{k}^{T}H{Z}_{k}$$ is called the projected Hessian. Assuming the projected
Hessian matrix $${Z}_{k}^{T}H{Z}_{k}$$ is positive semidefinite, the minimum of the function

*q*(

*p*) in the subspace defined by

*Z*occurs when ∇

_{k}*q*(

*p*) = 0, which is the solution of the system of linear equations

$${Z}_{k}^{T}H{Z}_{k}p=-{Z}_{k}^{T}c.$$ | (21) |

The algorithm then takes a step of the form

$${x}_{k+1}={x}_{k}+\alpha {d}_{k},$$

where

$${d}_{k}={Z}_{k}p.$$

Due to the quadratic nature of the objective function, only two choices of
step length *α* exist at each iteration. A step of unity along
*d _{k}* is the exact step to the
minimum of the function restricted to the null space of $${\overline{A}}_{k}$$. If the algorithm can take such a step without violating the
constraints, then this step is the solution to the quadratic program (Equation 18). Otherwise, the step along

*d*to the nearest constraint is less than unity, and the algorithm includes a new constraint in the active set at the next iteration. The distance to the constraint boundaries in any direction

_{k}*d*is given by

_{k}$$\alpha =\underset{i\in \left\{1,\mathrm{...},m\right\}}{\mathrm{min}}\left\{\frac{-\left({A}_{i}{x}_{k}-{b}_{i}\right)}{{A}_{i}{d}_{k}}\right\},$$

which is defined for constraints not in the active set, and where the
direction *d _{k}* is towards the constraint
boundary, that is, $${A}_{i}{d}_{k}>0,\text{}i=1,\mathrm{...},m$$.

When the active set includes *n* independent constraints,
without location of the minimum, the algorithm calculates the Lagrange
multipliers *λ _{k}*, which satisfy the
nonsingular set of linear equations

$${\overline{A}}_{k}^{T}{\lambda}_{k}=c+H{x}_{k}.$$ | (22) |

If all elements of *λ _{k}* are
nonnegative,

*x*is the optimal solution of the quadratic programming problem Equation 1. However, if any component of

_{k}*λ*is negative, and the component does not correspond to an equality constraint, then the minimization is not complete. The algorithm deletes the element corresponding to the most negative multiplier and starts a new iteration.

_{k}Sometimes, when the solver finishes with all nonnegative Lagrange multipliers, the first-order optimality measure is above the tolerance, or the constraint tolerance is not met. In these cases, the solver attempts to reach a better solution by following the restart procedure described in [1]. In this procedure, the solver discards the current set of active constraints, relaxes the constraints a bit, and constructs a new set of active constraints while attempting to solve the problem in a manner that avoids cycling (repeatedly returning to the same state). If necessary, the solver can perform the restart procedure several times.

**Note**

Do not try to stop the algorithm early by setting large values of the
`ConstraintTolerance`

and
`OptimalityTolerance`

options. Generally, the solver
iterates without checking these values until it reaches a potential stopping
point, and only then checks to see whether the tolerances are
satisfied.

Occasionally, the `active-set`

algorithm can have difficulty
detecting when a problem is unbounded. This issue can occur if the direction of
unboundedness *v* is a direction where the quadratic term
*v'Hv* = 0. For numerical stability and to enable a
Cholesky factorization, the `active-set`

algorithm adds a
small, strictly convex term to the quadratic objective. This small term causes
the objective function to be bounded away from –`Inf`

. In this
case, the `active-set`

algorithm reaches an iteration limit
instead of reporting that the solution is unbounded. In other words, the
algorithm halts with exit flag `0`

instead of
–`3`

.

## References

[1] Gill, P. E., W.
Murray, M. A. Saunders, and M. H. Wright. *A practical
anti-cycling procedure for linearly constrained
optimization.* Math. Programming 45 (1), August 1989, pp.
437–474.

#### Warm Start

When you run the `quadprog`

or `lsqlin`

`'active-set'`

algorithm with a warm start object as the start
point, the solver attempts to skip many of the Phase 1 and Phase 2 steps. The
warm start object contains the active set of constraints, and this set can be
correct or close to correct for the new problem. Therefore, the solver can avoid
iterations to add constraints to the active set. Also, the initial point might
be close to the solution for the new problem. For more information, see
`optimwarmstart`

.