Main Content

Define Custom Deep Learning Layer for Code Generation

If Deep Learning Toolbox™ does not provide the layer you require for your classification or regression problem, then you can define your own custom layer using this example as a guide. For a list of built-in layers, see List of Deep Learning Layers.

To define a custom deep learning layer, you can use the template provided in this example, which takes you through the following steps:

  1. Name the layer — Give the layer a name so that you can use it in MATLAB®.

  2. Declare the layer properties — Specify the properties of the layer, including learnable parameters and state parameters.

  3. Create a constructor function (optional) — Specify how to construct the layer and initialize its properties. If you do not specify a constructor function, then at creation, the software initializes the Name, Description, and Type properties with [] and sets the number of layer inputs and outputs to 1.

  4. Create initialize function (optional) — Specify how to initialize the learnable and state parameters when the software initializes the network. If you do not specify an initialize function, then the software does not initialize parameters when it initializes the network.

  5. Create forward functions — Specify how data passes forward through the layer (forward propagation) at prediction time and at training time.

  6. Create reset state function (optional) — Specify how to reset state parameters.

  7. Create a backward function (optional) — Specify the derivatives of the loss with respect to the input data and the learnable parameters (backward propagation). If you do not specify a backward function, then the forward functions must support dlarray objects.

To create a custom layer that supports code generation:

  • The layer must specify the pragma %#codegen in the layer definition.

  • The inputs of predict must be:

    • Consistent in dimension. Each input must have the same number of dimensions.

    • Consistent in batch size. Each input must have the same batch size.

  • The outputs of predict must be consistent in dimension and batch size with the layer inputs.

  • Nonscalar properties must have type single, double, or character array.

  • Scalar properties must have type numeric, logical, or string.

Code generation supports intermediate layers with 2-D image or feature input only. Code generation does not support layers with state properties (properties with attribute State).

This example shows how to create a PReLU layer [1], which is a layer with a learnable parameter, and use it in a convolutional neural network. A PReLU layer performs a threshold operation, where for each channel, any input value less than zero is multiplied by a scalar learned at training time. For values less than zero, a PReLU layer applies scaling coefficients αi to each channel of the input. These coefficients form a learnable parameter, which the layer learns during training.

This figure from [1] compares the ReLU and PReLU layer functions.

Side by side plots of the ReLU and PReLU activation functions. For values of y greater than zero, both functions scale linearly. For values of y less than zero, the ReLU function returns zero and the PReLU function scales linearly according to the scaling coefficient.

Intermediate Layer Template

Copy the intermediate layer template into a new file in MATLAB. This template gives the structure of an intermediate layer class definition. It outlines:

  • The optional properties blocks for the layer properties, learnable parameters, and state parameters.

  • The layer constructor function.

  • The optional initialize function.

  • The predict function and the optional forward function.

  • The optional resetState function for layers with state properties.

  • The optional backward function.

classdef myLayer < nnet.layer.Layer % ...
        % & nnet.layer.Formattable ... % (Optional) 
        % & nnet.layer.Acceleratable % (Optional)

    properties
        % (Optional) Layer properties.

        % Declare layer properties here.
    end

    properties (Learnable)
        % (Optional) Layer learnable parameters.

        % Declare learnable parameters here.
    end

    properties (State)
        % (Optional) Layer state parameters.

        % Declare state parameters here.
    end

    properties (Learnable, State)
        % (Optional) Nested dlnetwork objects with both learnable
        % parameters and state parameters.

        % Declare nested networks with learnable and state parameters here.
    end

    methods
        function layer = myLayer()
            % (Optional) Create a myLayer.
            % This function must have the same name as the class.

            % Define layer constructor function here.
        end

        function layer = initialize(layer,layout)
            % (Optional) Initialize layer learnable and state parameters.
            %
            % Inputs:
            %         layer  - Layer to initialize
            %         layout - Data layout, specified as a networkDataLayout
            %                  object
            %
            % Outputs:
            %         layer - Initialized layer
            %
            %  - For layers with multiple inputs, replace layout with 
            %    layout1,...,layoutN, where N is the number of inputs.
            
            % Define layer initialization function here.
        end
        

        function [Z,state] = predict(layer,X)
            % Forward input data through the layer at prediction time and
            % output the result and updated state.
            %
            % Inputs:
            %         layer - Layer to forward propagate through 
            %         X     - Input data
            % Outputs:
            %         Z     - Output of layer forward function
            %         state - (Optional) Updated layer state
            %
            %  - For layers with multiple inputs, replace X with X1,...,XN, 
            %    where N is the number of inputs.
            %  - For layers with multiple outputs, replace Z with 
            %    Z1,...,ZM, where M is the number of outputs.
            %  - For layers with multiple state parameters, replace state 
            %    with state1,...,stateK, where K is the number of state 
            %    parameters.

            % Define layer predict function here.
        end

        function [Z,state,memory] = forward(layer,X)
            % (Optional) Forward input data through the layer at training
            % time and output the result, the updated state, and a memory
            % value.
            %
            % Inputs:
            %         layer - Layer to forward propagate through 
            %         X     - Layer input data
            % Outputs:
            %         Z      - Output of layer forward function 
            %         state  - (Optional) Updated layer state 
            %         memory - (Optional) Memory value for custom backward
            %                  function
            %
            %  - For layers with multiple inputs, replace X with X1,...,XN, 
            %    where N is the number of inputs.
            %  - For layers with multiple outputs, replace Z with 
            %    Z1,...,ZM, where M is the number of outputs.
            %  - For layers with multiple state parameters, replace state 
            %    with state1,...,stateK, where K is the number of state 
            %    parameters.

            % Define layer forward function here.
        end

        function layer = resetState(layer)
            % (Optional) Reset layer state.

            % Define reset state function here.
        end

        function [dLdX,dLdW,dLdSin] = backward(layer,X,Z,dLdZ,dLdSout,memory)
            % (Optional) Backward propagate the derivative of the loss
            % function through the layer.
            %
            % Inputs:
            %         layer   - Layer to backward propagate through 
            %         X       - Layer input data 
            %         Z       - Layer output data 
            %         dLdZ    - Derivative of loss with respect to layer 
            %                   output
            %         dLdSout - (Optional) Derivative of loss with respect 
            %                   to state output
            %         memory  - Memory value from forward function
            % Outputs:
            %         dLdX   - Derivative of loss with respect to layer input
            %         dLdW   - (Optional) Derivative of loss with respect to
            %                  learnable parameter 
            %         dLdSin - (Optional) Derivative of loss with respect to 
            %                  state input
            %
            %  - For layers with state parameters, the backward syntax must
            %    include both dLdSout and dLdSin, or neither.
            %  - For layers with multiple inputs, replace X and dLdX with
            %    X1,...,XN and dLdX1,...,dLdXN, respectively, where N is
            %    the number of inputs.
            %  - For layers with multiple outputs, replace Z and dlZ with
            %    Z1,...,ZM and dLdZ,...,dLdZM, respectively, where M is the
            %    number of outputs.
            %  - For layers with multiple learnable parameters, replace 
            %    dLdW with dLdW1,...,dLdWP, where P is the number of 
            %    learnable parameters.
            %  - For layers with multiple state parameters, replace dLdSin
            %    and dLdSout with dLdSin1,...,dLdSinK and 
            %    dLdSout1,...,dldSoutK, respectively, where K is the number
            %    of state parameters.

            % Define layer backward function here.
        end
    end
end

Name Layer and Specify Superclasses

First, give the layer a name. In the first line of the class file, replace the existing name myLayer with codegenPreluLayer and add a comment describing the layer.

classdef codegenPreluLayer < nnet.layer.Layer & nnet.layer.Formattable
    % Example custom PReLU layer with codegen support.

    ...
end

If you do not specify a backward function, then the layer functions, by default, receive unformatted dlarray objects as input. To specify that the layer receives formatted dlarray objects as input and also outputs formatted dlarray objects, also inherit from the nnet.layer.Formattable class when defining the custom layer.

The layer does not require formattable inputs, so remove the optional nnet.layer.Formattable superclass.

classdef codegenPreluLayer < nnet.layer.Layer
    % Example custom PReLU layer with codegen support.

    ...
end

Next, rename the myLayer constructor function (the first function in the methods section) so that it has the same name as the layer.

    methods
        function layer = codegenPreluLayer()           
            ...
        end

        ...
     end

Save Layer

Save the layer class file in a new file named codegenPreluLayer.m. The file name must match the layer name. To use the layer, you must save the file in the current folder or in a folder on the MATLAB path.

Specify Code Generation Pragma

Add the %#codegen directive (or pragma) to your layer definition to indicate that you intend to generate code for this layer. Adding this directive instructs the MATLAB Code Analyzer to help you diagnose and fix violations that result in errors during code generation.

classdef codegenPreluLayer < nnet.layer.Layer
    % Example custom PReLU layer with codegen support.

    %#codegen

    ...
end

Declare Properties and Learnable Parameters

Declare the layer properties in the properties section and declare learnable parameters by listing them in the properties (Learnable) section.

By default, custom intermediate layers have these properties. Do not declare these properties in the properties section.

PropertyDescription
NameLayer name, specified as a character vector or a string scalar. For Layer array input, the trainnet, trainNetwork, assembleNetwork, layerGraph, and dlnetwork functions automatically assign names to layers with the name "".
Description

One-line description of the layer, specified as a string scalar or a character vector. This description appears when the layer is displayed in a Layer array.

If you do not specify a layer description, then the software displays the layer class name.

Type

Type of the layer, specified as a character vector or a string scalar. The value of Type appears when the layer is displayed in a Layer array.

If you do not specify a layer type, then the software displays the layer class name.

NumInputsNumber of inputs of the layer, specified as a positive integer. If you do not specify this value, then the software automatically sets NumInputs to the number of names in InputNames. The default value is 1.
InputNamesInput names of the layer, specified as a cell array of character vectors. If you do not specify this value and NumInputs is greater than 1, then the software automatically sets InputNames to {'in1',...,'inN'}, where N is equal to NumInputs. The default value is {'in'}.
NumOutputsNumber of outputs of the layer, specified as a positive integer. If you do not specify this value, then the software automatically sets NumOutputs to the number of names in OutputNames. The default value is 1.
OutputNamesOutput names of the layer, specified as a cell array of character vectors. If you do not specify this value and NumOutputs is greater than 1, then the software automatically sets OutputNames to {'out1',...,'outM'}, where M is equal to NumOutputs. The default value is {'out'}.

If the layer has no other properties, then you can omit the properties section.

Tip

If you are creating a layer with multiple inputs, then you must set either the NumInputs or InputNames properties in the layer constructor. If you are creating a layer with multiple outputs, then you must set either the NumOutputs or OutputNames properties in the layer constructor. For an example, see Define Custom Deep Learning Layer with Multiple Inputs.

To support code generation:

  • Nonscalar properties must have type single, double, or character array.

  • Scalar properties must be numeric or have type logical or string.

A PReLU layer does not require any additional properties, so you can remove the properties section.

A PReLU layer has only one learnable parameter, the scaling coefficient a. Declare this learnable parameter in the properties (Learnable) section and call the parameter Alpha.

    properties (Learnable)
        % Layer learnable parameters
            
        % Scaling coefficient
        Alpha
    end

Create Constructor Function

Create the function that constructs the layer and initializes the layer properties. Specify any variables required to create the layer as inputs to the constructor function.

The PReLU layer constructor function requires two input arguments: the number of channels of the expected input data and the layer name. The number of channels specifies the size of the learnable parameter Alpha. Specify two input arguments named numChannels and name in the codegenPreluLayer function. Add a comment to the top of the function that explains the syntax of the function.

        function layer = codegenPreluLayer(numChannels, name)
            % layer = codegenPreluLayer(numChannels) creates a PReLU layer with
            % numChannels channels and specifies the layer name.

            ...
        end

Code generation does not support arguments blocks.

Initialize Layer Properties

Initialize the layer properties, including learnable parameters, in the constructor function. Replace the comment % Layer constructor function goes here with code that initializes the layer properties.

Set the Name property to the input argument name.

            % Set layer name.
            layer.Name = name;

Give the layer a one-line description by setting the Description property of the layer. Set the description to describe the type of layer and its size.

            % Set layer description.
            layer.Description = "PReLU with " + numChannels + " channels";

For a PReLU layer, when the input values are negative, the layer multiplies each channel of the input by the corresponding channel of Alpha. Initialize the learnable parameter Alpha as a random vector of size 1-by-1-by-numChannels. With the third dimension specified as size numChannels, the layer can use element-wise multiplication of the input in the forward function. Alpha is a property of the layer object, so you must assign the vector to layer.Alpha.

            % Initialize scaling coefficient.
            layer.Alpha = rand([1 1 numChannels]);

View the completed constructor function.

        function layer = codegenPreluLayer(numChannels, name) 
            % layer = codegenPreluLayer(numChannels, name) creates a PReLU
            % layer for 2-D image input with numChannels channels and specifies 
            % the layer name.

            % Set layer name.
            layer.Name = name;

            % Set layer description.
            layer.Description = "PReLU with " + numChannels + " channels";
        
            % Initialize scaling coefficient.
            layer.Alpha = rand([1 1 numChannels]); 
        end

With this constructor function, the command codegenPreluLayer(3,'prelu') creates a PReLU layer with three channels and the name 'prelu'.

Create Forward Functions

Create the layer forward functions to use at prediction time and training time.

Create a function named predict that propagates the data forward through the layer at prediction time and outputs the result.

The predict function syntax depends on the type of layer.

  • Z = predict(layer,X) forwards the input data X through the layer and outputs the result Z, where layer has a single input and a single output.

  • [Z,state] = predict(layer,X) also outputs the updated state parameter state, where layer has a single state parameter.

You can adjust the syntaxes for layers with multiple inputs, multiple outputs, or multiple state parameters:

  • For layers with multiple inputs, replace X with X1,...,XN, where N is the number of inputs. The NumInputs property must match N.

  • For layers with multiple outputs, replace Z with Z1,...,ZM, where M is the number of outputs. The NumOutputs property must match M.

  • For layers with multiple state parameters, replace state with state1,...,stateK, where K is the number of state parameters.

Tip

If the number of inputs to the layer can vary, then use varargin instead of X1,…,XN. In this case, varargin is a cell array of the inputs, where varargin{i} corresponds to Xi.

If the number of outputs can vary, then use varargout instead of Z1,…,ZN. In this case, varargout is a cell array of the outputs, where varargout{j} corresponds to Zj.

Because a PReLU layer has only one input and one output, the syntax for predict for a PReLU layer is Z = predict(layer,X).

Code generation supports custom intermediate layers with 2-D image input only. The inputs are h-by-w-by-c-by-N arrays, where h, w, and c correspond to the height, width, and number of channels of the images, respectively, and N is the number of observations. The observation dimension is 4.

For code generation support, all the layer inputs must have the same number of dimensions and batch size.

By default, the layer uses predict as the forward function at training time. To use a different forward function at training time, or retain a value required for a custom backward function, you must also create a function named forward. The software does not generate code for the forward function but it must be code generation compatible.

The forward function propagates the data forward through the layer at training time and also outputs a memory value.

The forward function syntax depends on the type of layer:

  • Z = forward(layer,X) forwards the input data X through the layer and outputs the result Z, where layer has a single input and a single output.

  • [Z,state] = forward(layer,X) also outputs the updated state parameter state, where layer has a single state parameter.

  • [__,memory] = forward(layer,X) also returns a memory value for a custom backward function using any of the previous syntaxes. If the layer has both a custom forward function and a custom backward function, then the forward function must return a memory value.

You can adjust the syntaxes for layers with multiple inputs, multiple outputs, or multiple state parameters:

  • For layers with multiple inputs, replace X with X1,...,XN, where N is the number of inputs. The NumInputs property must match N.

  • For layers with multiple outputs, replace Z with Z1,...,ZM, where M is the number of outputs. The NumOutputs property must match M.

  • For layers with multiple state parameters, replace state with state1,...,stateK, where K is the number of state parameters.

Tip

If the number of inputs to the layer can vary, then use varargin instead of X1,…,XN. In this case, varargin is a cell array of the inputs, where varargin{i} corresponds to Xi.

If the number of outputs can vary, then use varargout instead of Z1,…,ZN. In this case, varargout is a cell array of the outputs, where varargout{j} corresponds to Zj.

The PReLU operation is given by

f(xi)={xiif xi>0αixiif xi0

where xi is the input of the nonlinear activation f on channel i, and αi is the coefficient controlling the slope of the negative part. The subscript i in αi indicates that the nonlinear activation can vary on different channels.

Implement this operation in predict. In predict, the input X corresponds to x in the equation. The output Z corresponds to f(xi).

Add a comment to the top of the function that explains the syntaxes of the function.

Tip

If you preallocate arrays using functions such as zeros, then you must ensure that the data types of these arrays are consistent with the layer function inputs. To create an array of zeros of the same data type as another array, use the "like" option of zeros. For example, to initialize an array of zeros of size sz with the same data type as the array X, use Z = zeros(sz,"like",X).

Implementing the backward function is optional when the forward functions fully support dlarray input. For code generation support, the predict function must also support numeric input.

One way to calculate the output of the PReLU operation is to use the following code.

Z = max(X,0) + layer.Alpha .* min(0,X);
Because code generation does not support implicit expansion via the .* operation, you can use the bsxfun function instead.
Z = max(X,0) + bsxfun(@times, layer.Alpha, min(0,X));
However, the bsxfun does not support dlarray input. To implement the predict function, which supports both code generation and dlarray input, use an if statement with the isdlarray function to select the appropriate code for the type of input.

        function Z = predict(layer, X)
            % Z = predict(layer, X) forwards the input data X through the
            % layer and outputs the result Z.
            
            if isdlarray(X)
                Z = max(X,0) + layer.Alpha .* min(0,X);
            else
                Z = max(X,0) + bsxfun(@times, layer.Alpha, min(0,X));
            end
        end

Because the predict function fully supports dlarray objects, defining the backward function is optional. For a list of functions that support dlarray objects, see List of Functions with dlarray Support.

Completed Layer

View the completed layer class file.

classdef codegenPreluLayer < nnet.layer.Layer
    % Example custom PReLU layer with codegen support.

    %#codegen

    properties (Learnable)
        % Layer learnable parameters
            
        % Scaling coefficient
        Alpha
    end
    
    methods
        function layer = codegenPreluLayer(numChannels, name) 
            % layer = codegenPreluLayer(numChannels, name) creates a PReLU
            % layer for 2-D image input with numChannels channels and specifies 
            % the layer name.

            % Set layer name.
            layer.Name = name;

            % Set layer description.
            layer.Description = "PReLU with " + numChannels + " channels";
        
            % Initialize scaling coefficient.
            layer.Alpha = rand([1 1 numChannels]); 
        end
        
        function Z = predict(layer, X)
            % Z = predict(layer, X) forwards the input data X through the
            % layer and outputs the result Z.
            
            if isdlarray(X)
                Z = max(X,0) + layer.Alpha .* min(0,X);
            else
                Z = max(X,0) + bsxfun(@times, layer.Alpha, min(0,X));
            end
        end
    end
end

Check Custom Layer for Code Generation Compatibility

Check the code generation compatibility of the custom layer codegenPreluLayer.

The custom layer codegenPreluLayer, attached to this is example as a supporting file, applies the PReLU operation to the input data. To access this layer, open this example as a live script.

Create an instance of the layer and check its validity using checkLayer. Specify the valid input size as the size of a single observation of typical input to the layer. The layer expects 4-D array inputs, where the first three dimensions correspond to the height, width, and number of channels of the previous layer output, and the fourth dimension corresponds to the observations.

Specify the typical size of the input of an observation and set the 'ObservationDimension' option to 4. You do not need to specify the observation dimension if you specify a networkDataLayout object instead of a valid input size. To check for code generation compatibility, set the CheckCodegenCompatibility option to true. The checkLayer function does not check for functions that are not compatible with code generation. To check that the custom layer definition is supported for code generation, first use the Code Generation Readiness app. For more information, see Check Code by Using the Code Generation Readiness Tool (MATLAB Coder).

layer = codegenPreluLayer(20,"prelu");
validInputSize = [24 24 20];
checkLayer(layer,validInputSize,ObservationDimension=4,CheckCodegenCompatibility=true)
Skipping initialization tests. The layer does not have an initialize function.
 
Skipping GPU tests. No compatible GPU device found.
 
Running nnet.checklayer.TestLayerWithoutBackward
.......... .......... ...
Done nnet.checklayer.TestLayerWithoutBackward
__________

Test Summary:
	 23 Passed, 0 Failed, 0 Incomplete, 11 Skipped.
	 Time elapsed: 0.6552 seconds.

The function does not detect any issues with the layer.

References

[1] "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification." In 2015 IEEE International Conference on Computer Vision (ICCV), 1026–34. Santiago, Chile: IEEE, 2015. https://doi.org/10.1109/ICCV.2015.123.

See Also

| | | | | | | | |

Related Topics