lbfgsupdate

Update parameters using limited-memory BFGS (L-BFGS)

Since R2023a

collapse all in page

Syntax

[netUpdated,solverStateUpdated] = lbfgsupdate(net,lossFcn,solverState)

[parametersUpdated,solverStateUpdated] = lbfgsupdate(parameters,lossFcn,solverState)

___ = lbfgsupdate(___,Name=Value)

Description

Update the network learnable parameters in a custom training loop using the limited-memory BFGS (L-BFGS) algorithm.

The L-BFGS algorithm [1] is a quasi-Newton method that approximates the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. Use the L-BFGS algorithm for small networks and data sets that you can process in a single batch.

Note

This function applies the L-BFGS optimization algorithm to update network parameters in custom training loops. To train a neural network using the trainnet function using the L-BFGS solver, use the trainingOptions function and set the solver to "lbfgs".

[netUpdated,solverStateUpdated] = lbfgsupdate(net,lossFcn,solverState) updates the learnable parameters of the network net using the L-BFGS algorithm with the specified loss function and solver state. Use this syntax in a training loop to iteratively update a network defined as a dlnetwork object.

example

[parametersUpdated,solverStateUpdated] = lbfgsupdate(parameters,lossFcn,solverState) updates the learnable parameters in parameters using the L-BFGS algorithm with the specified loss function and solver state. Use this syntax in a training loop to iteratively update the learnable parameters of a network defined as a function.

___ = lbfgsupdate(___,Name=Value) specifies additional options using one or more name-value arguments.

Examples

collapse all

Update Learnable Parameters in Neural Network

Open Live Script

Read the transmission casing data from the CSV file "transmissionCasingData.csv".

filename = "transmissionCasingData.csv";
tbl = readtable(filename,TextType="String");

Convert the labels for prediction to categorical using the convertvars function.

labelName = "GearToothCondition";
tbl = convertvars(tbl,labelName,"categorical");

To train a network using categorical features, convert the categorical predictors to categorical using the convertvars function by specifying a string array containing the names of all the categorical input variables.

categoricalPredictorNames = ["SensorCondition" "ShaftCondition"];
tbl = convertvars(tbl,categoricalPredictorNames,"categorical");

Loop over the categorical input variables. For each variable, convert the categorical values to one-hot encoded vectors using the onehotencode function.

for i = 1:numel(categoricalPredictorNames)
    name = categoricalPredictorNames(i);
    tbl.(name) = onehotencode(tbl.(name),2);
end

View the first few rows of the table.

head(tbl)

    SigMean     SigMedian    SigRMS    SigVar     SigPeak    SigPeak2Peak    SigSkewness    SigKurtosis    SigCrestFactor    SigMAD     SigRangeCumSum    SigCorrDimension    SigApproxEntropy    SigLyapExponent    PeakFreq    HighFreqPower    EnvPower    PeakSpecKurtosis    SensorCondition    ShaftCondition    GearToothCondition
    ________    _________    ______    _______    _______    ____________    ___________    ___________    ______________    _______    ______________    ________________    ________________    _______________    ________    _____________    ________    ________________    _______________    ______________    __________________

    -0.94876     -0.9722     1.3726    0.98387    0.81571       3.6314        -0.041525       2.2666           2.0514         0.8081        28562              1.1429             0.031581            79.931            0          6.75e-06       3.23e-07         162.13             0    1             1    0          No Tooth Fault  
    -0.97537    -0.98958     1.3937    0.99105    0.81571       3.6314        -0.023777       2.2598           2.0203        0.81017        29418              1.1362             0.037835            70.325            0          5.08e-08       9.16e-08         226.12             0    1             1    0          No Tooth Fault  
      1.0502      1.0267     1.4449    0.98491     2.8157       3.6314         -0.04162       2.2658           1.9487        0.80853        31710              1.1479             0.031565            125.19            0          6.74e-06       2.85e-07         162.13             0    1             0    1          No Tooth Fault  
      1.0227      1.0045     1.4288    0.99553     2.8157       3.6314        -0.016356       2.2483           1.9707        0.81324        30984              1.1472             0.032088             112.5            0          4.99e-06        2.4e-07         162.13             0    1             0    1          No Tooth Fault  
      1.0123      1.0024     1.4202    0.99233     2.8157       3.6314        -0.014701       2.2542           1.9826        0.81156        30661              1.1469              0.03287            108.86            0          3.62e-06       2.28e-07         230.39             0    1             0    1          No Tooth Fault  
      1.0275      1.0102     1.4338     1.0001     2.8157       3.6314         -0.02659       2.2439           1.9638        0.81589        31102              1.0985             0.033427            64.576            0          2.55e-06       1.65e-07         230.39             0    1             0    1          No Tooth Fault  
      1.0464      1.0275     1.4477     1.0011     2.8157       3.6314        -0.042849       2.2455           1.9449        0.81595        31665              1.1417             0.034159            98.838            0          1.73e-06       1.55e-07         230.39             0    1             0    1          No Tooth Fault  
      1.0459      1.0257     1.4402    0.98047     2.8157       3.6314        -0.035405       2.2757            1.955        0.80583        31554              1.1345               0.0353            44.223            0          1.11e-06       1.39e-07         230.39             0    1             0    1          No Tooth Fault

Extract the training data.

predictorNames = ["SigMean" "SigMedian" "SigRMS" "SigVar" "SigPeak" "SigPeak2Peak" ...
    "SigSkewness" "SigKurtosis" "SigCrestFactor" "SigMAD" "SigRangeCumSum" ...
    "SigCorrDimension" "SigApproxEntropy" "SigLyapExponent" "PeakFreq" ...
    "HighFreqPower" "EnvPower" "PeakSpecKurtosis" "SensorCondition" "ShaftCondition"];
XTrain = table2array(tbl(:,predictorNames));
numInputFeatures = size(XTrain,2);

Extract the targets and convert them to one-hot encoded vectors.

TTrain = tbl.(labelName);
TTrain = onehotencode(TTrain,2);
numClasses = size(TTrain,2);

Convert the predictors and targets to dlarray objects with format "BC" (batch, channel).

XTrain = dlarray(XTrain,"BC");
TTrain = dlarray(TTrain,"BC");

Define the network architecture.

numHiddenUnits = 32;

layers = [
    featureInputLayer(numInputFeatures)
    fullyConnectedLayer(16)
    layerNormalizationLayer
    reluLayer
    fullyConnectedLayer(numClasses)
    softmaxLayer];

net = dlnetwork(layers);

Define the modelLoss function, listed in the Model Loss Function section of the example. This function takes as input a neural network, input data, and targets. The function returns the loss and the gradients of the loss with respect to the network learnable parameters.

The lbfgsupdate function requires a loss function with the syntax [loss,gradients] = f(net). Create a variable that parameterizes the evaluated modelLoss function to take a single input argument.

lossFcn = @(net) dlfeval(@modelLoss,net,XTrain,TTrain);

Initialize an L-BFGS solver state object with a maximum history size of 3 and an initial inverse Hessian approximation factor of 1.1.

solverState = lbfgsState( ...
    HistorySize=3, ...
    InitialInverseHessianFactor=1.1);

Train the network a maximum of 200 iterations. Stop training early when the norm of the gradients or steps are smaller than 0.00001. Print the training loss every 10 iterations.

maxIterations = 200;
gradientTolerance = 1e-5;
stepTolerance = 1e-5;

iteration = 0;

while iteration < maxIterations
    iteration = iteration + 1;
    [net, solverState] = lbfgsupdate(net,lossFcn,solverState);

    if iteration==1 || mod(iteration,10)==0
        fprintf("Iteration %d: Loss: %d\n",iteration,solverState.Loss);
    end

    if solverState.GradientsNorm < gradientTolerance || ...
            solverState.StepNorm < stepTolerance || ...
            solverState.LineSearchStatus == "failed"
        break
    end
end

Iteration 1: Loss: 9.343235e-01
Iteration 10: Loss: 4.721475e-01
Iteration 20: Loss: 4.678603e-01
Iteration 30: Loss: 4.666519e-01
Iteration 40: Loss: 4.665828e-01
Iteration 50: Loss: 4.665125e-01
Iteration 60: Loss: 4.662501e-01
Iteration 70: Loss: 4.653304e-01
Iteration 80: Loss: 4.651739e-01
Iteration 90: Loss: 4.631418e-01
Iteration 100: Loss: 4.619108e-01
Iteration 110: Loss: 4.605178e-01
Iteration 120: Loss: 4.583337e-01
Iteration 130: Loss: 4.529905e-01
Iteration 140: Loss: 4.477209e-01
Iteration 150: Loss: 4.422221e-01
Iteration 160: Loss: 4.360900e-01
Iteration 170: Loss: 4.318508e-01
Iteration 180: Loss: 4.262885e-01
Iteration 190: Loss: 4.242133e-01
Iteration 200: Loss: 4.230380e-01

Model Loss Function

The modelLoss function takes as input a neural network net, input data X, and targets T. The function returns the loss and the gradients of the loss with respect to the network learnable parameters.

function [loss, gradients] = modelLoss(net, X, T)

Y = forward(net,X);
loss = crossentropy(Y,T);
gradients = dlgradient(loss,net.Learnables);

end

Input Arguments

collapse all

`net` — Neural network
`dlnetwork` object

Neural network, specified as a dlnetwork object.

The function updates the Learnables property of the dlnetwork object. net.Learnables is a table with three variables:

Layer — Layer name, specified as a string scalar.
Parameter — Parameter name, specified as a string scalar.
Value — Parameter value, specified as a cell array containing a dlarray object.

`parameters` — Learnable parameters
`dlarray` object | numeric array | cell array | structure | table

Learnable parameters, specified as a dlarray object, a numeric array, a cell array, a structure, or a table.

If you specify parameters as a table, it must contain these variables:

Layer — Layer name, specified as a string scalar.
Parameter — Parameter name, specified as a string scalar.
Value — Parameter value, specified as a cell array containing a dlarray object.

You can specify parameters as a container of learnable parameters for your network using a cell array, structure, or table, or using nested cell arrays or structures. The learnable parameters inside the cell array, structure, or table must be dlarray objects or numeric values with the data type double or single.

If parameters is a numeric array, then lossFcn must not use the dlgradient function.

`lossFcn` — Loss function
function handle | `AcceleratedFunction` object

Loss function, specified as a function handle or an AcceleratedFunction object with the syntax [loss,gradients] = f(net), where loss and gradients correspond to the loss and gradients of the loss with respect to the learnable parameters, respectively.

To parameterize a model loss function that has a call to the dlgradient function, specify the loss function as @(net) dlfeval(@modelLoss,net,arg1,...,argN), where modelLoss is a function with the syntax [loss,gradients] = modelLoss(net,arg1,...,argN) that returns the loss and gradients of the loss with respect to the learnable parameters in net given arguments arg1,...,argN.

If parameters is a numeric array, then the loss function must not use the dlgradient or dlfeval functions.

If the loss function has more than two outputs, also specify the NumLossFunctionOutputs argument.

Data Types: function_handle

`solverState` — Solver state
`lbfgsState` object | `[]`

Solver state, specified as an lbfgsState object or [].

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: lbfgsupdate(net,lossFcn,solverState,LineSearchMethod="strong-wolfe") updates the learnable parameters in net and searches for a learning rate that satisfies the strong Wolfe conditions.

`LineSearchMethod` — Method to find suitable learning rate
`"weak-wolfe"` (default) | `"strong-wolfe"` | `"backtracking"`

Method to find suitable learning rate, specified as one of these values:

"weak-wolfe" — Search for a learning rate that satisfies the weak Wolfe conditions. This method maintains a positive definite approximation of the inverse Hessian matrix.
"strong-wolfe" — Search for a learning rate that satisfies the strong Wolfe conditions. This method maintains a positive definite approximation of the inverse Hessian matrix.
"backtracking" — Search for a learning rate that satisfies sufficient decrease conditions. This method does not maintain a positive definite approximation of the inverse Hessian matrix.

`MaxNumLineSearchIterations` — Maximum number of line search iterations
`20` (default) | positive integer

Maximum number of line search iterations to determine the learning rate, specified as a positive integer.

`NumLossFunctionOutputs` — Number of loss function outputs
`2` (default) | integer greater than or equal to two

Number of loss function outputs, specified as an integer greater than or equal to two. Set this option when lossFcn has more than two output arguments.

Output Arguments

collapse all

`netUpdated` — Updated network
`dlnetwork` object

Updated network, returned as a dlnetwork object.

The function updates the Learnables property of the dlnetwork object.

`parametersUpdated` — Updated learnable parameters
`dlarray` | numeric array | cell array | structure | table

Updated learnable parameters, returned as an object with the same type as parameters.

`solverStateUpdated` — Updated solver state
`lbfgsState`

Updated solver state, returned as an lbfgsState state object.

Algorithms

collapse all

Limited-Memory BFGS

The algorithm updates learnable parameters W at iteration k+1 using the update step given by

$W_{k + 1} = W_{k} - η_{k} B_{k}^{- 1} \nabla J (W_{k}),$

where W_k denotes the weights at iteration k, $η_{k}$ is the learning rate at iteration k, B_k is an approximation of the Hessian matrix at iteration k, and $\nabla J (W_{k})$ denotes the gradients of the loss with respect to the learnable parameters at iteration k.

The L-BFGS algorithm computes the matrix-vector product $B_{k}^{- 1} \nabla J (W_{k})$ directly. The algorithm does not require computing the inverse of B_k.

To save memory, the L-BFGS algorithm does not store and invert the dense Hessian matrix B. Instead, the algorithm uses the approximation $B_{k - m}^{- 1} \approx λ_{k} I$ , where m is the history size, the inverse Hessian factor $λ_{k}$ is a scalar, and I is the identity matrix. The algorithm then stores the scalar inverse Hessian factor only. The algorithm updates the inverse Hessian factor at each step.

To compute the matrix-vector product $B_{k}^{- 1} \nabla J (W_{k})$ directly, the L-BFGS algorithm uses this recursive algorithm:

Set $r = B_{k - m}^{- 1} \nabla J (W_{k})$ , where m is the history size.
For $i = m, \dots, 1$ :
1. Let $β = \frac{1}{s_{k - i}^{⊤} y_{k - i}} y_{k - i}^{⊤} r$ , where $s_{k - i}$ and $y_{k - i}$ are the step and gradient differences for iteration $k - i$ , respectively.
2. Set $r = r + s_{k - i} (a_{k - i} - β)$ , where $a$ is derived from $s$ , $y$ , and the gradients of the loss with respect to the loss function. For more information, see [1].
Return $B_{k}^{- 1} \nabla J (W_{k}) = r$ .

References

[1] Liu, Dong C., and Jorge Nocedal. "On the limited memory BFGS method for large scale optimization." Mathematical programming 45, no. 1 (August 1989): 503-528. https://doi.org/10.1007/BF01589116.

Extended Capabilities

expand all

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

The lbfgsupdate function supports GPU array input with these usage notes and limitations:

When lossFcn outputs data of type gpuArray or dlarray with underlying data of type gpuArray, this function runs on the GPU.

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2023a

lbfgsupdate

Syntax

Description

Examples

Update Learnable Parameters in Neural Network

Input Arguments

`net` — Neural network
`dlnetwork` object

`parameters` — Learnable parameters
`dlarray` object | numeric array | cell array | structure | table

`lossFcn` — Loss function
function handle | `AcceleratedFunction` object

`solverState` — Solver state
`lbfgsState` object | `[]`

Name-Value Arguments

`LineSearchMethod` — Method to find suitable learning rate
`"weak-wolfe"` (default) | `"strong-wolfe"` | `"backtracking"`

`MaxNumLineSearchIterations` — Maximum number of line search iterations
`20` (default) | positive integer

`NumLossFunctionOutputs` — Number of loss function outputs
`2` (default) | integer greater than or equal to two

Output Arguments

`netUpdated` — Updated network
`dlnetwork` object

`parametersUpdated` — Updated learnable parameters
`dlarray` | numeric array | cell array | structure | table

`solverStateUpdated` — Updated solver state
`lbfgsState`

Algorithms

Limited-Memory BFGS

References

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Topics

lbfgsupdate

Syntax

Description

Examples

Update Learnable Parameters in Neural Network

Input Arguments

net — Neural network dlnetwork object

parameters — Learnable parameters dlarray object | numeric array | cell array | structure | table

lossFcn — Loss function function handle | AcceleratedFunction object

solverState — Solver state lbfgsState object | []

Name-Value Arguments

LineSearchMethod — Method to find suitable learning rate "weak-wolfe" (default) | "strong-wolfe" | "backtracking"

MaxNumLineSearchIterations — Maximum number of line search iterations 20 (default) | positive integer

NumLossFunctionOutputs — Number of loss function outputs 2 (default) | integer greater than or equal to two

Output Arguments

netUpdated — Updated network dlnetwork object

parametersUpdated — Updated learnable parameters dlarray | numeric array | cell array | structure | table

solverStateUpdated — Updated solver state lbfgsState

Algorithms

Limited-Memory BFGS

References

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Topics

`net` — Neural network
`dlnetwork` object

`parameters` — Learnable parameters
`dlarray` object | numeric array | cell array | structure | table

`lossFcn` — Loss function
function handle | `AcceleratedFunction` object

`solverState` — Solver state
`lbfgsState` object | `[]`

`LineSearchMethod` — Method to find suitable learning rate
`"weak-wolfe"` (default) | `"strong-wolfe"` | `"backtracking"`

`MaxNumLineSearchIterations` — Maximum number of line search iterations
`20` (default) | positive integer

`NumLossFunctionOutputs` — Number of loss function outputs
`2` (default) | integer greater than or equal to two

`netUpdated` — Updated network
`dlnetwork` object

`parametersUpdated` — Updated learnable parameters
`dlarray` | numeric array | cell array | structure | table

`solverStateUpdated` — Updated solver state
`lbfgsState`

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.