# Minimization with Dense Structured Hessian, Linear Equalities

### Hessian Multiply Function for Lower Memory

The `fmincon`

`interior-point`

and `trust-region-reflective`

algorithms, and the `fminunc`

`trust-region`

algorithm, can solve problems where the Hessian is dense but structured. For these problems, `fmincon`

and `fminunc`

do not compute `H*Y`

with the Hessian `H`

directly, because forming `H`

would be memory-intensive. Instead, you must provide `fmincon`

or `fminunc`

with a function that, given a matrix `Y`

and information about `H`

, computes `W = H*Y`

.

In this example, the objective function is nonlinear and linear equalities exist so `fmincon`

is used. The description applies to the `trust-region reflective`

algorithm; the `fminunc`

`trust-region`

algorithm is similar. For the `interior-point`

algorithm, see the `HessianMultiplyFcn`

option in Hessian Multiply Function. The objective function has the structure

$$f\left(x\right)=\underset{}{\overset{\u02c6}{f}}\left(x\right)-\frac{1}{2}{x}^{T}V{V}^{T}x,$$

where $$V$$ is a 1000-by-2 matrix. The Hessian of $$f$$ is dense, but the Hessian of $$\underset{}{\overset{\u02c6}{f}}$$ is sparse. If the Hessian of $$\underset{}{\overset{\u02c6}{f}}$$ is $$\underset{}{\overset{\u02c6}{H}}$$, then $$H$$, the Hessian of $$f$$, is

$$H=\underset{}{\overset{\u02c6}{H}}-V{V}^{T}.$$

To avoid excessive memory usage that could happen by working with $$H$$ directly, the example provides a Hessian multiply function, `hmfleq1`

. This function, when passed a matrix `Y`

, uses sparse matrices `Hinfo`

, which corresponds to $$\underset{}{\overset{\u02c6}{H}}$$, and `V`

to compute the Hessian matrix product

`W = H*Y = (Hinfo - V*V')*Y.`

In this example, the Hessian multiply function needs $$\underset{}{\overset{\u02c6}{H}}$$, and `V`

to compute the Hessian matrix product. `V`

is a constant, so you can capture `V`

in a function handle to an anonymous function.

However, $$\underset{}{\overset{\u02c6}{H}}$$ is not a constant and must be computed at the current `x`

. You can do this by computing $$\underset{}{\overset{\u02c6}{H}}$$ in the objective function and returning $$\underset{}{\overset{\u02c6}{H}}$$ as `Hinfo`

in the third output argument. By using `optimoptions`

to set the `HessianMultiplyFcn`

option to a handle to the `hmfleq1`

Hessian multiply function listed here, `fmincon`

knows to get the `Hinfo`

value from the objective function and pass it to the Hessian multiply function `hmfleq1`

.

#### Step 1: Write a file brownvv.m that computes the objective function, the gradient, and the sparse part of the Hessian.

The example passes `brownvv`

to `fmincon`

as the objective function. The `brownvv.m`

file is long and is not listed here. You can view the code with the command

```
type brownvv
```

Because `brownvv`

computes the gradient as well as the objective function, the example (Step 3) uses `optimoptions`

to set the `SpecifyObjectiveGradient`

option to `true`

.

#### Step 2: Write a function to compute Hessian-matrix products for H given a matrix Y.

Now, define a function `hmfleq1`

that uses `Hinfo`

, which is computed in `brownvv`

, and `V`

, which you can capture in a function handle to an anonymous function, to compute the Hessian matrix product `W`

where `W = H*Y = (Hinfo - V*V')*Y`

. This function must have the form

`W = hmfleq1(Hinfo,Y)`

The first argument must be the same as the third argument returned by the objective function `brownvv`

. The second argument to the Hessian multiply function is the matrix `Y`

(of `W = H*Y`

).

Because `fmincon`

expects the second argument `Y`

to be used to form the Hessian matrix product, `Y`

is always a matrix with `n`

rows, where `n`

is the number of dimensions in the problem. The number of columns in `Y`

can vary. Finally, you can use a function handle to an anonymous function to capture `V`

, so `V`

can be the third argument to `hmfleq1`

. This technique is described in more detail in Passing Extra Parameters. Here is a listing of the file `hmfleq1.m`

.

`type hmfleq1.m`

function W = hmfleq1(Hinfo,Y,V) %HMFLEQ1 Hessian-matrix product function for BROWNVV objective. % Documentation example. % W = hmfbx4(Hinfo,Y,V) computes W = (Hinfo-V*V')*Y % where Hinfo is a sparse matrix computed by BROWNVV % and V is a 2 column matrix. % Copyright 1984-2008 The MathWorks, Inc. W = Hinfo*Y - V*(V'*Y);

#### Step 3: Call a nonlinear minimization routine with a starting point and linear equality constraints.

Load the problem parameter, `V`

, and the sparse equality constraint matrices, `Aeq`

and `beq`

, from `fleq1.mat`

, which is available when you run this example. Use `optimoptions`

to set the `SpecifyObjectiveGradient`

option to `true`

and to set the `HessianMultiplyFcn`

option to a function handle that points to `hmfleq1`

. Call `fmincon`

with objective function `brownvv`

and with `V`

as an additional parameter. View the code that runs this procedure:

`type runfleq1.m`

function [fval,exitflag,output,x] = runfleq1 % RUNFLEQ1 demonstrates 'HessMult' option for FMINCON with linear % equalities. problem = load('fleq1'); % Get V, Aeq, beq V = problem.V; Aeq = problem.Aeq; beq = problem.beq; n = 1000; % problem dimension xstart = -ones(n,1); xstart(2:2:n,1) = ones(length(2:2:n),1); % starting point options = optimoptions(@fmincon,... 'Algorithm','trust-region-reflective',... 'SpecifyObjectiveGradient',true, ... 'HessianMultiplyFcn',@(Hinfo,Y)hmfleq1(Hinfo,Y,V),... 'Display','iter',... 'OptimalityTolerance',1e-9,... 'FunctionTolerance',1e-9); [x,fval,exitflag,output] = fmincon(@(x)brownvv(x,V),xstart,[],[],Aeq,beq,[],[], ... [],options);

To run this code, enter

[fval,exitflag,output,x] = runfleq1;

Norm of First-order Iteration f(x) step optimality CG-iterations 0 2297.63 1.41e+03 1 1084.59 6.3903 578 1 2 1084.59 100 578 3 3 1084.59 25 578 0 4 1084.59 6.25 578 0 5 1047.61 1.5625 240 0 6 761.592 3.125 62.4 2 7 761.592 6.25 62.4 4 8 746.478 1.5625 163 0 9 546.578 3.125 84.1 2 10 274.311 6.25 26.9 2 11 55.6193 11.6597 40 2 12 55.6193 25 40 3 13 22.2964 6.25 26.3 0 14 -49.516 6.25 78 1 15 -93.2772 1.5625 68 1 16 -207.204 3.125 86.5 1 17 -434.162 6.25 70.7 1 18 -681.359 6.25 43.7 2 19 -681.359 6.25 43.7 4 20 -698.041 1.5625 191 0 21 -723.959 3.125 256 7 22 -751.33 0.78125 154 3 23 -793.974 1.5625 24.4 3 24 -820.831 2.51937 6.11 3 25 -823.069 0.562132 2.87 3 26 -823.237 0.196753 0.486 3 27 -823.245 0.0621202 0.386 3 28 -823.246 0.0199951 0.11 6 29 -823.246 0.00731333 0.0404 7 30 -823.246 0.00505883 0.0185 8 31 -823.246 0.00126471 0.00268 9 32 -823.246 0.00149326 0.00521 9 33 -823.246 0.000373314 0.00091 9 Local minimum possible. fmincon stopped because the final change in function value relative to its initial value is less than the value of the function tolerance.

Convergence is rapid for a problem of this size with the PCG iteration cost increasing modestly as the optimization progresses. Feasibility of the equality constraints is maintained at the solution.

problem = load('fleq1'); % Get V, Aeq, beq V = problem.V; Aeq = problem.Aeq; beq = problem.beq; norm(Aeq*x-beq,inf)

ans = 1.9540e-14

### Preconditioning

In this example, `fmincon`

cannot use `H`

to compute a preconditioner because `H`

only exists implicitly. Instead of `H`

, `fmincon`

uses `Hinfo`

, the third argument returned by `brownvv`

, to compute a preconditioner. `Hinfo`

is a good choice because it is the same size as `H`

and approximates `H`

to some degree. If `Hinfo`

were not the same size as `H`

, `fmincon`

would compute a preconditioner based on some diagonal scaling matrices determined from the algorithm. Typically, this would not perform as well.