## Define Custom Deep Learning Layers

Tip

This topic explains how to define custom deep learning layers for your problems. For a list of built-in layers in Deep Learning Toolbox™, see List of Deep Learning Layers.

This topic explains the architecture of deep learning layers and how to define custom layers to use for your problems.

TypeDescription
Layer

Define a custom deep learning layer and specify optional learnable parameters.

For an example showing how to define a custom layer with learnable parameters, see Define Custom Deep Learning Layer with Learnable Parameters. For an example showing how to define a custom layer with multiple inputs, see Define Custom Deep Learning Layer with Multiple Inputs.

Classification Output Layer

Define a custom classification output layer and specify a loss function.

For an example showing how to define a custom classification output layer and specify a loss function, see Define Custom Classification Output Layer.

Regression Output Layer

Define a custom regression output layer and specify a loss function.

For an example showing how to define a custom regression output layer and specify a loss function, see Define Custom Regression Output Layer.

### Layer Templates

You can use the following templates to define new layers.

### Intermediate Layer Architecture

During training, the software iteratively performs forward and backward passes through the network.

When making a forward pass through the network, each layer takes the outputs of the previous layers, applies a function, and then outputs (forward propagates) the results to the next layers.

Layers can have multiple inputs or outputs. For example, a layer can take X1, …, Xn from multiple previous layers and forward propagate the outputs Z1, …, Zm to the next layers.

At the end of a forward pass of the network, the output layer calculates the loss L between the predictions Y and the true targets T.

During the backward pass of a network, each layer takes the derivatives of the loss with respect to the outputs of the layer, computes the derivatives of the loss L with respect to the inputs, and then backward propagates the results. If the layer has learnable parameters, then the layer also computes the derivatives of the layer weights (learnable parameters). The layer uses the derivatives of the weights to update the learnable parameters.

The following figure describes the flow of data through a deep neural network and highlights the data flow through a layer with a single input X, a single output Z, and a learnable parameter W.

#### Intermediate Layer Properties

Declare the layer properties in the properties section of the class definition.

By default, custom intermediate layers have these properties.

PropertyDescription
NameLayer name, specified as a character vector or a string scalar. To include a layer in a layer graph, you must specify a nonempty, unique layer name. If you train a series network with the layer and Name is set to '', then the software automatically assigns a name to the layer at training time.
Description

One-line description of the layer, specified as a character vector or a string scalar. This description appears when the layer is displayed in a Layer array. If you do not specify a layer description, then the software displays the layer class name.

TypeType of the layer, specified as a character vector or a string scalar. The value of Type appears when the layer is displayed in a Layer array. If you do not specify a layer type, then the software displays the layer class name.
NumInputsNumber of inputs of the layer, specified as a positive integer. If you do not specify this value, then the software automatically sets NumInputs to the number of names in InputNames. The default value is 1.
InputNamesInput names of the layer, specified as a cell array of character vectors. If you do not specify this value and NumInputs is greater than 1, then the software automatically sets InputNames to {'in1',...,'inN'}, where N is equal to NumInputs. The default value is {'in'}.
NumOutputsNumber of outputs of the layer, specified as a positive integer. If you do not specify this value, then the software automatically sets NumOutputs to the number of names in OutputNames. The default value is 1.
OutputNamesOutput names of the layer, specified as a cell array of character vectors. If you do not specify this value and NumOutputs is greater than 1, then the software automatically sets OutputNames to {'out1',...,'outM'}, where M is equal to NumOutputs. The default value is {'out'}.

If the layer has no other properties, then you can omit the properties section.

Tip

If you are creating a layer with multiple inputs, then you must set either the NumInputs or InputNames properties in the layer constructor. If you are creating a layer with multiple outputs, then you must set either the NumOutputs or OutputNames properties in the layer constructor. For an example, see Define Custom Deep Learning Layer with Multiple Inputs.

#### Learnable Parameters

Declare the layer learnable parameters in the properties (Learnable) section of the class definition. You can specify numeric arrays or dlnetwork objects as learnable parameters. If the layer has no learnable parameters, then you can omit the properties (Learnable) section.

Optionally, you can specify the learning rate factor and the L2 factor of the learnable parameters. By default, each learnable parameter has its learning rate factor and L2 factor set to 1.

For both built-in and custom layers, you can set and get the learn rate factors and L2 regularization factors using the following functions.

FunctionDescription
setLearnRateFactorSet the learn rate factor of a learnable parameter.
setL2FactorSet the L2 regularization factor of a learnable parameter.
getLearnRateFactorGet the learn rate factor of a learnable parameter.
getL2FactorGet the L2 regularization factor of a learnable parameter.

To specify the learning rate factor and the L2 factor of a learnable parameter, use the syntaxes layer = setLearnRateFactor(layer,'MyParameterName',value) and layer = setL2Factor(layer,parameterName,value), respectively.

To get the value of the learning rate factor and the L2 factor of a learnable parameter, use the syntaxes getLearnRateFactor(layer,'MyParameterName') and getL2Factor(layer,parameterName) respectively.

For example, this syntax sets the learn rate factor of the learnable parameter with the name 'Alpha' to 0.1.

layer = setLearnRateFactor(layer,'Alpha',0.1);

#### Forward Functions

Some layers behave differently during training and during prediction. For example, a dropout layer performs dropout only during training and has no effect during prediction. A layer uses one of two functions to perform a forward pass: predict or forward. If the forward pass is at prediction time, then the layer uses the predict function. If the forward pass is at training time, then the layer uses the forward function. If you do not require two different functions for prediction time and training time, then you can omit the forward function. In this case, the layer uses predict at training time.

If you define the function forward and custom backward function, then you must assign a value to the argument memory, which you can use during backward propagation.

The syntax for predict is [Z1,…,Zm] = predict(layer,X1,…,Xn), where X1,…,Xn are the n layer inputs and Z1,…,Zm are the m layer outputs. The values n and m must correspond to the NumInputs and NumOutputs properties of the layer.

Tip

If the number of inputs to predict can vary, then use varargin instead of X1,…,Xn. In this case, varargin is a cell array of the inputs, where varargin{i} corresponds to Xi. If the number of outputs can vary, then use varargout instead of Z1,…,Zm. In this case, varargout is a cell array of the outputs, where varargout{j} corresponds to Zj.

Tip

If the custom layer has a dlnetwork object for a learnable parameter, then in the predict function of the custom layer, use the predict function for the dlnetwork. Using the dlnetwork object predict function ensures that the software uses the correct layer operations for prediction.

The syntax for forward is [Z1,…,Zm,memory] = forward(layer,X1,…,Xn), where X1,…,Xn are the n layer inputs, Z1,…,Zm are the m layer outputs, and memory is the memory of the layer.

Tip

If the number of inputs to forward can vary, then use varargin instead of X1,…,Xn. In this case, varargin is a cell array of the inputs, where varargin{i} corresponds to Xi. If the number of outputs can vary, then use varargout instead of Z1,…,Zm. In this case, varargout is a cell array of the outputs, where varargout{j} corresponds to Zj for j = 1,…,NumOutputs and varargout{NumOutputs + 1} corresponds to memory.

Tip

If the custom layer has a dlnetwork object for a learnable parameter, then in the forward function of the custom layer, use the forward function of the dlnetwork object. Using the dlnetwork object forward function ensures that the software uses the correct layer operations for training.

The dimensions of the inputs depend on the type of data and the output of the connected layers:

Layer InputInput SizeObservation Dimension
2-D imagesh-by-w-by-c-by-N, where h, w, and c correspond to the height, width, and number of channels of the images, respectively, and N is the number of observations.4
3-D imagesh-by-w-by-d-by-c-by-N, where h, w, d, and c correspond to the height, width, depth, and number of channels of the 3-D images, respectively, and N is the number of observations.5
Vector sequencesc-by-N-by-S, where c is the number of features of the sequences, N is the number of observations, and S is the sequence length.2
2-D image sequencesh-by-w-by-c-by-N-by-S, where h, w, and c correspond to the height, width, and number of channels of the images, respectively, N is the number of observations, and S is the sequence length.4
3-D image sequencesh-by-w-by-d-by-c-by-N-by-S, where h, w, d, and c correspond to the height, width, depth, and number of channels of the 3-D images, respectively, N is the number of observations, and S is the sequence length.5

For layers that output sequences, the layers can output sequences of any length or output data with no time dimension. Note that when training a network that outputs sequences using the trainNetwork function, the lengths of the input and output sequences must match.

#### Backward Function

The layer backward function computes the derivatives of the loss with respect to the input data and then outputs (backward propagates) results to the previous layer. If the layer has learnable parameters (for example, layer weights), then backward also computes the derivatives of the learnable parameters. When using the trainNetwork function, the layer automatically updates the learnable parameters using these derivatives during the backward pass.

Defining the backward function is optional. If you do not specify a backward function, and the layer forward functions support dlarray objects, then the software automatically determines the backward function using automatic differentiation. For a list of functions that support dlarray objects, see List of Functions with dlarray Support. Define a custom backward function when you want to:

• Use a specific algorithm to compute the derivatives.

• Use operations in the forward functions that do not support dlarray objects.

Custom layers with learnable dlnetwork objects do not support custom backward functions.

To define a custom backward function, create a function named backward.

The syntax for backward is [dLdX1,…,dLdXn,dLdW1,…,dLdWk] = backward(layer,X1,…,Xn,Z1,…,Zm,dLdZ1,…,dLdZm,memory), where:

• X1,…,Xn are the n layer inputs

• Z1,…,Zm are the m outputs of the layer forward functions

• dLdZ1,…,dLdZm are the gradients backward propagated from the next layer

• memory is the memory output of forward if forward is defined, otherwise, memory is [].

For the outputs, dLdX1,…,dLdXn are the derivatives of the loss with respect to the layer inputs and dLdW1,…,dLdWk are the derivatives of the loss with respect to the k learnable parameters. To reduce memory usage by preventing unused variables being saved between the forward and backward pass, replace the corresponding input arguments with ~.

Tip

If the number of inputs to backward can vary, then use varargin instead of the input arguments after layer. In this case, varargin is a cell array of the inputs, where varargin{i} corresponds to Xi for i=1,…,NumInputs, varargin{NumInputs+j} and varargin{NumInputs+NumOutputs+j} correspond to Zj and dLdZj, respectively, for j=1,…,NumOutputs, and varargin{end} corresponds to memory.

If the number of outputs can vary, then use varargout instead of the output arguments. In this case, varargout is a cell array of the outputs, where varargout{i} corresponds to dLdXi for i=1,…,NumInputs and varargout{NumInputs+t} corresponds to dLdWt for t=1,…,k, where k is the number of learnable parameters.

The values of X1,…,Xn and Z1,…,Zm are the same as in the forward functions. The dimensions of dLdZ1,…,dLdZm are the same as the dimensions of Z1,…,Zm, respectively.

The dimensions and data type of dLdX1,…,dLdxn are the same as the dimensions and data type of X1,…,Xn, respectively. The dimensions and data types of dLdW1,…,dLdWk are the same as the dimensions and data types of W1,…,Wk, respectively.

To calculate the derivatives of the loss, you can use the chain rule:

$\frac{\partial L}{\partial {X}^{\left(i\right)}}=\sum _{j}^{}\frac{\partial L}{\partial {z}_{j}}\frac{\partial {z}_{j}}{\partial {X}^{\left(i\right)}}$

$\frac{\partial L}{\partial {W}_{i}}=\sum _{j}\frac{\partial L}{\partial {Z}_{j}}\frac{\partial {Z}_{j}}{\partial {W}_{i}}$

When using the trainNetwork function, the layer automatically updates the learnable parameters using the derivatives dLdW1,…,dLdWk during the backward pass.

For an example showing how to define a custom backward function, see Specify Custom Layer Backward Function.

#### GPU Compatibility

If the layer forward functions fully support dlarray objects, then the layer is GPU compatible. Otherwise, to be GPU compatible, the layer functions must support inputs and return outputs of type gpuArray (Parallel Computing Toolbox).

Many MATLAB® built-in functions support gpuArray (Parallel Computing Toolbox) and dlarray input arguments. For a list of functions that support dlarray objects, see List of Functions with dlarray Support. For a list of functions that execute on a GPU, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox). To use a GPU for deep learning, you must also have a supported GPU device. For information on supported devices, see GPU Support by Release (Parallel Computing Toolbox). For more information on working with GPUs in MATLAB, see GPU Computing in MATLAB (Parallel Computing Toolbox).

#### Code Generation Compatibility

To create a custom layer that supports code generation:

• The layer must specify the pragma %#codegen in the layer definition.

• The inputs of predict must be:

• Consistent in dimension. Each input must have the same number of dimensions.

• Consistent in batch size. Each input must have the same batch size.

• The outputs of predict must be consistent in dimension and batch size with the layer inputs.

• Nonscalar properties must have type single, double, or character array.

• Scalar properties must have type numeric, logical, or string.

Code generation supports intermediate layers with 2-D image input only.

For an example showing how to create a custom layer that supports code generation, see Define Custom Deep Learning Layer for Code Generation.

#### Network Composition

To create a custom layer that itself defines a layer graph, you can specify a dlnetwork object as a learnable parameter. This method is known as network composition. You can use network composition to:

• Create a single custom layer that represents a block of learnable layers, for example, a residual block.

• Create a network with control flow, for example, a network with a section that can dynamically change depending on the input data.

• Create a network with loops, for example, a network with sections that feed the output back into itself.

### Check Validity of Layer

If you create a custom deep learning layer, then you can use the checkLayer function to check that the layer is valid. The function checks layers for validity, GPU compatibility, correctly defined gradients, and code generation compatibility. To check that a layer is valid, run the following command:

checkLayer(layer,validInputSize,'ObservationDimension',dim)
where layer is an instance of the layer, validInputSize is a vector or cell array specifying the valid input sizes to the layer, and dim specifies the dimension of the observations in the layer input data. For large input sizes, the gradient checks take longer to run. To speed up the tests, specify a smaller valid input size.

#### Check Validity of Layer Using checkLayer

Check the layer validity of the custom layer preluLayer.

Define a custom PReLU layer. To create this layer, save the file preluLayer.m in the current folder.

Create an instance of the layer and check its validity using checkLayer. Specify the valid input size to be the size of a single observation of typical input to the layer. The layer expects 4-D array inputs, where the first three dimensions correspond to the height, width, and number of channels of the previous layer output, and the fourth dimension corresponds to the observations.

Specify the typical size of the input of an observation and set 'ObservationDimension' to 4.

layer = preluLayer(20,'prelu');
validInputSize = [24 24 20];
checkLayer(layer,validInputSize,'ObservationDimension',4)
Skipping GPU tests. No compatible GPU device found.

Skipping code generation compatibility tests. To check validity of the layer for code generation, specify the 'CheckCodegenCompatibility' and 'ObservationDimension' options.

Running nnet.checklayer.TestLayerWithoutBackward
.......... ...
Done nnet.checklayer.TestLayerWithoutBackward
__________

Test Summary:
13 Passed, 0 Failed, 0 Incomplete, 9 Skipped.
Time elapsed: 0.18046 seconds.

Here, the function does not detect any issues with the layer.

### Include Layer in Network

You can use a custom layer in the same way as any other layer in Deep Learning Toolbox.

Define a custom PReLU layer. To create this layer, save the file preluLayer.m in the current folder.

Create a layer array that includes the custom layer preluLayer.

layers = [
imageInputLayer([28 28 1])
convolution2dLayer(5,20)
batchNormalizationLayer
preluLayer(20,'prelu')
fullyConnectedLayer(10)
softmaxLayer
classificationLayer];

### Output Layer Architecture

At the end of a forward pass at training time, an output layer takes the predictions (outputs) y of the previous layer and calculates the loss L between these predictions and the training targets. The output layer computes the derivatives of the loss L with respect to the predictions y and outputs (backward propagates) results to the previous layer.

The following figure describes the flow of data through a convolutional neural network and an output layer.

#### Output Layer Properties

Declare the layer properties in the properties section of the class definition.

By default, custom output layers have the following properties:

• NameLayer name, specified as a character vector or a string scalar. To include a layer in a layer graph, you must specify a nonempty, unique layer name. If you train a series network with the layer and Name is set to '', then the software automatically assigns a name to the layer at training time.

• Description – One-line description of the layer, specified as a character vector or a string scalar. This description appears when the layer is displayed in a Layer array. If you do not specify a layer description, then the software displays "Classification Output" or "Regression Output".

• Type – Type of the layer, specified as a character vector or a string scalar. The value of Type appears when the layer is displayed in a Layer array. If you do not specify a layer type, then the software displays the layer class name.

Custom classification layers also have the following property:

• ClassesClasses of the output layer, specified as a categorical vector, string array, cell array of character vectors, or 'auto'. If Classes is 'auto', then the software automatically sets the classes at training time. If you specify the string array or cell array of character vectors str, then the software sets the classes of the output layer to categorical(str,str).

Custom regression layers also have the following property:

• ResponseNamesNames of the responses, specified a cell array of character vectors or a string array. At training time, the software automatically sets the response names according to the training data. The default is {}.

If the layer has no other properties, then you can omit the properties section.

#### Loss Functions

The output layer computes the loss L between predictions and targets using the forward loss function and computes the derivatives of the loss with respect to the predictions using the backward loss function.

The syntax for forwardLoss is loss = forwardLoss(layer, Y, T). The input Y corresponds to the predictions made by the network. These predictions are the output of the previous layer. The input T corresponds to the training targets. The output loss is the loss between Y and T according to the specified loss function. The output loss must be scalar.

If the layer forward loss function supports dlarray objects, then the software automatically determines the backward loss function. For a list of functions that support dlarray objects, see List of Functions with dlarray Support. Alternatively, to define a custom backward loss function, create a function named backwardLoss. For an example showing how to define a custom backward loss function, see Specify Custom Output Layer Backward Loss Function.

The syntax for backwardLoss is dLdY = backwardLoss(layer, Y, T). The input Y contains the predictions made by the network and T contains the training targets. The output dLdY is the derivative of the loss with respect to the predictions Y. The output dLdY must be the same size as the layer input Y.

For classification problems, the dimensions of T depend on the type of problem.

2-D image classification1-by-1-by-K-by-N, where K is the number of classes and N is the number of observations.4
3-D image classification1-by-1-by-1-by-K-by-N, where K is the number of classes and N is the number of observations.5
Sequence-to-label classificationK-by-N, where K is the number of classes and N is the number of observations.2
Sequence-to-sequence classificationK-by-N-by-S, where K is the number of classes, N is the number of observations, and S is the sequence length.2

The size of Y depends on the output of the previous layer. To ensure that Y is the same size as T, you must include a layer that outputs the correct size before the output layer. For example, to ensure that Y is a 4-D array of prediction scores for K classes, you can include a fully connected layer of size K followed by a softmax layer before the output layer.

For regression problems, the dimensions of T also depend on the type of problem.

2-D image regression1-by-1-by-R-by-N, where R is the number of responses and N is the number of observations.4
2-D Image-to-image regressionh-by-w-by-c-by-N, where h, w, and c are the height, width, and number of channels of the output respectively, and N is the number of observations.4
3-D image regression1-by-1-by-1-by-R-by-N, where R is the number of responses and N is the number of observations.5
3-D Image-to-image regressionh-by-w-by-d-by-c-by-N, where h, w, d, and c are the height, width, depth, and number of channels of the output respectively, and N is the number of observations.5
Sequence-to-one regressionR-by-N, where R is the number of responses and N is the number of observations.2
Sequence-to-sequence regressionR-by-N-by-S, where R is the number of responses, N is the number of observations, and S is the sequence length.2

For example, if the network defines an image regression network with one response and has mini-batches of size 50, then T is a 4-D array of size 1-by-1-by-1-by-50.

The size of Y depends on the output of the previous layer. To ensure that Y is the same size as T, you must include a layer that outputs the correct size before the output layer. For example, for image regression with R responses, to ensure that Y is a 4-D array of the correct size, you can include a fully connected layer of size R before the output layer.

The forwardLoss and backwardLoss functions have the following output arguments.

FunctionOutput ArgumentDescription
forwardLosslossCalculated loss between the predictions Y and the true target T.
backwardLossdLdYDerivative of the loss with respect to the predictions Y.

The backwardLoss must output dLdY with the size expected by the previous layer and dLdY to be the same size as Y.

#### GPU Compatibility

If the layer forward functions fully support dlarray objects, then the layer is GPU compatible. Otherwise, to be GPU compatible, the layer functions must support inputs and return outputs of type gpuArray (Parallel Computing Toolbox).

Many MATLAB built-in functions support gpuArray (Parallel Computing Toolbox) and dlarray input arguments. For a list of functions that support dlarray objects, see List of Functions with dlarray Support. For a list of functions that execute on a GPU, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox). To use a GPU for deep learning, you must also have a supported GPU device. For information on supported devices, see GPU Support by Release (Parallel Computing Toolbox). For more information on working with GPUs in MATLAB, see GPU Computing in MATLAB (Parallel Computing Toolbox).

#### Include Custom Regression Output Layer in Network

You can use a custom output layer in the same way as any other output layer in Deep Learning Toolbox. This section shows how to create and train a network for regression using a custom output layer.

The example constructs a convolutional neural network architecture, trains a network, and uses the trained network to predict angles of rotated, handwritten digits. These predictions are useful for optical character recognition.

Define a custom mean absolute error regression layer. To create this layer, save the file maeRegressionLayer.m in the current folder.

[XTrain,~,YTrain] = digitTrain4DArrayData;

Create a layer array and include the custom regression output layer maeRegressionLayer.

layers = [
imageInputLayer([28 28 1])
convolution2dLayer(5,20)
batchNormalizationLayer
reluLayer
fullyConnectedLayer(1)
maeRegressionLayer('mae')]
layers =
6x1 Layer array with layers:

1   ''      Image Input           28x28x1 images with 'zerocenter' normalization
2   ''      Convolution           20 5x5 convolutions with stride [1  1] and padding [0  0  0  0]
3   ''      Batch Normalization   Batch normalization
4   ''      ReLU                  ReLU
5   ''      Fully Connected       1 fully connected layer
6   'mae'   Regression Output     Mean absolute error

Set the training options and train the network.

options = trainingOptions('sgdm','Verbose',false);
net = trainNetwork(XTrain,YTrain,layers,options);

Evaluate the network performance by calculating the prediction error between the predicted and actual angles of rotation.

[XTest,~,YTest] = digitTest4DArrayData;
YPred = predict(net,XTest);
predictionError = YTest - YPred;

Calculate the number of predictions within an acceptable error margin from the true angles. Set the threshold to 10 degrees and calculate the percentage of predictions within this threshold.

thr = 10;
numCorrect = sum(abs(predictionError) < thr);
numTestImages = size(XTest,4);
accuracy = numCorrect/numTestImages
accuracy = 0.7524