layernorm

Normalize data across all channels for each observation independently

Syntax

Y = layernorm(X,offset,scaleFactor)

Y = layernorm(X,offset,scaleFactor,'DataFormat',FMT)

Y = layernorm(___,Name,Value)

Description

The layer normalization operation normalizes the input data across all channels for each observation independently. To speed up training of recurrent and multilayer perceptron neural networks and reduce the sensitivity to network initialization, use layer normalization after the learnable operations, such as LSTM and fully connect operations.

After normalization, the operation shifts the input by a learnable offset β and scales it by a learnable scale factor γ.

The layernorm function applies the layer normalization operation to dlarray data. Using dlarray objects makes working with high dimensional data easier by allowing you to label the dimensions. For example, you can label which dimensions correspond to spatial, time, channel, and batch dimensions using the "S", "T", "C", and "B" labels, respectively. For unspecified and other dimensions, use the "U" label. For dlarray object functions that operate over particular dimensions, you can specify the dimension labels by formatting the dlarray object directly, or by using the DataFormat option.

Note

To apply layer normalization within a dlnetwork object, use layerNormalizationLayer.

Y = layernorm(X,offset,scaleFactor) applies the layer normalization operation to the input data X and transforms it using the specified offset and scale factor.

The function normalizes over the 'S' (spatial), 'T' (time), 'C' (channel), and 'U' (unspecified) dimensions of X for each observation in the 'B' (batch) dimension, independently.

For unformatted input data, use the 'DataFormat' option.

example

Y = layernorm(X,offset,scaleFactor,'DataFormat',FMT) applies the layer normalization operation to the unformatted dlarray object X with the format specified by FMT. The output Y is an unformatted dlarray object with dimensions in the same order as X. For example, 'DataFormat','SSCB' specifies data for 2-D image input with the format 'SSCB' (spatial, spatial, channel, batch).

To specify the format of the scale and offset, use the 'ScaleFormat' and 'OffsetFormat' options, respectively.

Y = layernorm(___,Name,Value) specifies options using one or more name-value pair arguments in addition to the input arguments in previous syntaxes. For example, 'Epsilon',1e-4 sets the epsilon value to 1e-4.

Examples

collapse all

Apply Layer Normalization

Open Live Script

Create a formatted dlarray object containing a batch of 128 sequences of length 100 with 10 channels. Specify the format 'CBT' (channel, batch, time).

numChannels = 10;
miniBatchSize = 128;
sequenceLength = 100;

X = rand(numChannels,miniBatchSize,sequenceLength);
dlX = dlarray(X,'CBT');

View the size and format of the input data.

size(dlX)

ans = 1×3

    10   128   100

dims(dlX)

ans = 
'CBT'

For per-observation channel-wise layer normalization, initialize the offset and scale with a vector of zeros and ones, respectively.

offset = zeros(numChannels,1);
scaleFactor = ones(numChannels,1);

Apply the layer normalization operation using the layernorm function.

dlY = layernorm(dlX,offset,scaleFactor);

View the size and the format of the output dlY.

size(dlY)

ans = 1×3

    10   128   100

dims(dlY)

ans = 
'CBT'

Apply Element-Wise Layer Normalization

Open Live Script

To perform element-wise layer normalization, specify an offset and scale factor with the same size as a single input observation.

Create a formatted dlarray object containing a batch of 128 224-by-224 images with 3 channels. Specify the format "SSCB" (spatial, spatial, channel, batch).

numChannels = 3;
miniBatchSize = 128;
H = 224;
W = 224;
X = rand(H,W,numChannels,miniBatchSize);
X = dlarray(X,"SSCB");

View the size and format of the input data.

size(X)

ans = 1×4

   224   224     3   128

dims(X)

ans = 
'SSCB'

For element-wise layer normalization, initialize the offset and scale with an array of zeros and ones, respectively.

offset = zeros(H,W,numChannels);
scaleFactor = ones(H,W,numChannels);

Apply the layer normalization operation using the layernorm function. Specify the offset and scale formats as "SSC" (spatial, spatial, channel) using the 'OffsetFormat' and 'ScaleFormat' options, respectively.

Y = layernorm(X,offset,scaleFactor,'OffsetFormat',"SSC",'ScaleFormat',"SSC");

View the size and the format of the output data.

size(Y)

ans = 1×4

   224   224     3   128

dims(Y)

ans = 
'SSCB'

Input Arguments

collapse all

`X` — Input data
`dlarray` | numeric array

Input data, specified as a formatted dlarray, an unformatted dlarray, or a numeric array.

If X is an unformatted dlarray or a numeric array, then you must specify the format using the DataFormat option. If X is a numeric array, then either scaleFactor or offset must be a dlarray object.

X must have a "C" (channel) dimension.

`offset` — Offset
`dlarray` | numeric array

Offset β, specified as a formatted dlarray, an unformatted dlarray, or a numeric array.

The size and format of the offset depends on the type of transformation.

Task Description

Channel-wise transformation

Task	Description
Channel-wise transformation	Array with one nonsingleton dimension with size matching the size of the `'C'` (channel) dimension of the input `X`. For channel-wise transformation, if `offset` is a formatted `dlarray` object, then the nonsingleton dimension must have label `'C'` (channel).
Element-wise transformation	Array with a `'C'` (channel) dimension with the same size as the `'C'` (channel) dimension of the input `X` and zero or the same number of `'S'` (spatial), `'T'` (time), and `'U'` (unspecified) dimensions of the input `X`. Each dimension must have size 1 or have sizes matching the corresponding dimensions in the input `X`. For any repeated dimensions, for example, multiple `'S'` (spatial) dimensions, the sizes must match the corresponding dimensions in `X` or must all be singleton. The software automatically expands any singleton dimensions to match the size of a single observation in the input `X`. For element-wise transformation, if `offset` is a numeric array or an unformatted `dlarray`, then you must specify the offset format using the `'OffsetFormat'` option.

Array with one nonsingleton dimension with size matching the size of the 'C' (channel) dimension of the input X.

For channel-wise transformation, if offset is a formatted dlarray object, then the nonsingleton dimension must have label 'C' (channel).

Element-wise transformation

Array with a 'C' (channel) dimension with the same size as the 'C' (channel) dimension of the input X and zero or the same number of 'S' (spatial), 'T' (time), and 'U' (unspecified) dimensions of the input X.

Each dimension must have size 1 or have sizes matching the corresponding dimensions in the input X. For any repeated dimensions, for example, multiple 'S' (spatial) dimensions, the sizes must match the corresponding dimensions in X or must all be singleton.

The software automatically expands any singleton dimensions to match the size of a single observation in the input X.

For element-wise transformation, if offset is a numeric array or an unformatted dlarray, then you must specify the offset format using the 'OffsetFormat' option.

`scaleFactor` — Scale factor
`dlarray` | numeric array

Scale factor γ, specified as a formatted dlarray, an unformatted dlarray, or a numeric array.

The size and format of the offset depends on the type of transformation.

Task Description

Channel-wise transformation

Task	Description
Channel-wise transformation	Array with one nonsingleton dimension with size matching the size of the `'C'` (channel) dimension of the input `X`. For channel-wise transformation, if `scaleFactor` is a formatted `dlarray` object, then the nonsingleton dimension must have label `'C'` (channel).
Element-wise transformation	Array with a `'C'` (channel) dimension with the same size as the `'C'` (channel) dimension of the input `X` and zero or the same number of `'S'` (spatial), `'T'` (time), and `'U'` (unspecified) dimensions of the input `X`. Each dimension must have size 1 or have sizes matching the corresponding dimensions in the input `X`. For any repeated dimensions, for example, multiple `'S'` (spatial) dimensions, the sizes must match the corresponding dimensions in `X` or must all be singleton. The software automatically expands any singleton dimensions to match the size of a single observation in the input `X`. For element-wise transformation, if `scaleFactor` is a numeric array or an unformatted `dlarray`, then you must specify the scale format using the `'ScaleFormat'` option.

Array with one nonsingleton dimension with size matching the size of the 'C' (channel) dimension of the input X.

For channel-wise transformation, if scaleFactor is a formatted dlarray object, then the nonsingleton dimension must have label 'C' (channel).

Element-wise transformation

The software automatically expands any singleton dimensions to match the size of a single observation in the input X.

For element-wise transformation, if scaleFactor is a numeric array or an unformatted dlarray, then you must specify the scale format using the 'ScaleFormat' option.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'Epsilon',1e-4 sets the variance offset value to 1e-4.

`DataFormat` — Description of data dimensions
character vector | string scalar

Description of the data dimensions, specified as a character vector or string scalar.

A data format is a string of characters, where each character describes the type of the corresponding data dimension.

The characters are:

"S" — Spatial
"C" — Channel
"B" — Batch
"T" — Time
"U" — Unspecified

For example, consider an array that represents a batch of sequences where the first, second, and third dimensions correspond to channels, observations, and time steps, respectively. You can describe the data as having the format "CBT" (channel, batch, time).

You can specify multiple dimensions labeled "S" or "U". You can use the labels "C", "B", and "T" once each, at most. The software ignores singleton trailing "U" dimensions after the second dimension.

If the input data is not a formatted dlarray object, then you must specify the DataFormat option.

For more information, see Deep Learning Data Formats.

Data Types: char | string

`Epsilon` — Constant to add to mini-batch variances
`1e-5` (default) | positive scalar

Constant to add to the mini-batch variances, specified as a positive scalar.

The software adds this constant to the mini-batch variances before normalization to ensure numerical stability and avoid division by zero.

Before R2023a: Epsilon must be greater than or equal to 1e-5.

`ScaleFormat` — Description of scale dimensions
character vector | string scalar

Description of the scale dimensions, specified as a character vector or string scalar.

A data format is a string of characters, where each character describes the type of the corresponding data dimension.

The characters are:

"S" — Spatial
"C" — Channel
"B" — Batch
"T" — Time
"U" — Unspecified

For layer normalization, the scale factor must have a "C" (channel) dimension. You can specify multiple dimensions labeled "S" or "U". You can use the label "T" (time) at most once. The scale factor must not have a "B" (batch) dimension.

For element-wise normalization, if scaleFactor is a numeric array or an unformatted dlarray object, then you must specify the ScaleFormat option.

For more information, see Deep Learning Data Formats.

Example: 'ScaleFormat',"SSCB"

Data Types: char | string

`OffsetFormat` — Description of offset dimensions
character vector | string scalar

Description of the offset dimensions, specified as a character vector or string scalar.

A data format is a string of characters, where each character describes the type of the corresponding data dimension.

The characters are:

"S" — Spatial
"C" — Channel
"B" — Batch
"T" — Time
"U" — Unspecified

For layer normalization, the offset must have a "C" (channel) dimension. You can specify multiple dimensions labeled "S" or 'U'. You can use the label "T" (time) at most once. The offset must not have a "B" (batch) dimension.

For element-wise normalization, if offset is a numeric array or an unformatted dlarray object, then you must specify the OffsetFormat option.

For more information, see Deep Learning Data Formats.

Example: 'OffsetFormat',"SSCB"

Data Types: char | string

`OperationDimension` — Dimension to normalize over
`"auto"` (default) | `"channel-only"` | `"spatial-channel"` | `"batch-excluded"`

Since R2023a

Dimension to normalize over, specified as one of these values:

"auto" — For feature, sequence, 1-D image, or spatial-temporal input, normalize over the channel dimension. Otherwise, normalize over the spatial and channel dimensions.
"channel-only" — Normalize over the channel dimension.
"spatial-channel" — Normalize over the spatial and channel dimensions.
"batch-excluded" — Normalize over all dimensions except for the batch dimension.

Output Arguments

collapse all

`Y` — Normalized data
`dlarray`

Normalized data, returned as a dlarray. The output Y has the same underlying data type as the input X.

If the input data X is a formatted dlarray, Y has the same dimension labels as X. If the input data is not a formatted dlarray, Y is an unformatted dlarray with the same dimension order as the input data.

Algorithms

collapse all

Layer Normalization

The layer normalization operation normalizes the elements x_i of the input by first calculating the mean μ_L and variance σ_L² over the spatial, time, and channel dimensions for each observation independently. Then, it calculates the normalized activations as

$\hat{x_{i}} = \frac{x_{i} - μ_{L}}{\sqrt{σ_{L}^{2} + ϵ}},$

where ϵ is a constant that improves numerical stability when the variance is very small.

To allow for the possibility that inputs with zero mean and unit variance are not optimal for the operations that follow layer normalization, the layer normalization operation further shifts and scales the activations using the transformation

$y_{i} = γ {\hat{x}}_{i} + β,$

where the offset β and scale factor γ are learnable parameters that are updated during network training.

Deep Learning Array Formats

Most deep learning networks and functions operate on different dimensions of the input data in different ways.

For example, an LSTM operation iterates over the time dimension of the input data, and a batch normalization operation normalizes over the batch dimension of the input data.

To provide input data with labeled dimensions or input data with additional layout information, you can use data formats.

A data format is a string of characters, where each character describes the type of the corresponding data dimension.

The characters are:

"S" — Spatial
"C" — Channel
"B" — Batch
"T" — Time
"U" — Unspecified

To create formatted input data, create a dlarray object and specify the format using the second argument.

To provide additional layout information with unformatted data, specify the formats using the DataFormat, ScaleFormat, and OffsetFormat arguments.

For more information, see Deep Learning Data Formats.

References

[1] Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. “Layer Normalization.” Preprint, submitted July 21, 2016. https://arxiv.org/abs/1607.06450.

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

The layernorm function supports GPU array input with these usage notes and limitations:

When at least one of the following input arguments is a gpuArray or a dlarray with underlying data of type gpuArray, this function runs on the GPU:
- X
- offset
- scaleFactor

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2021a

expand all

R2023a: Specify operation dimension

Specify which dimensions to normalize over using the OperationDimension option.

R2023a: `Epsilon` supports values less than `1e-5`

The Epsilon option also supports positive values less than 1e-5.

R2023a: Operation normalizes over channel and spatial dimensions of 2-D and 3-D image sequence data

Starting in R2023a, by default, the operation normalizes 2-D and 3-D image sequence data over the channel and spatial dimensions. In previous versions, the software normalizes over all dimensions except for the batch dimension (the spatial, time, and channel dimensions). Normalization over the channel and spatial dimensions is usually better suited for these types of data. To reproduce the previous behavior, set OperationDimension to "batch-excluded".

R2023a: Operation normalizes over channel dimension of 1-D image, vector sequence, and 1-D image sequence data

Starting in R2023a, by default, the operation normalizes 1-D image data (data with one spatial dimension and no time dimension), vector sequence (data with a time dimension and no spatial dimensions) and 1-D image sequence data (data with one spatial dimension and a time dimension) over the channel dimension. In previous versions, the software normalizes over all dimensions except for the batch dimension (the spatial, time, and channel dimensions). Normalization over the channel dimension is usually better suited for these types of data. To reproduce the previous behavior, set OperationDimension to "batch-excluded".

layernorm

Syntax

Description

Examples

Apply Layer Normalization

Apply Element-Wise Layer Normalization

Input Arguments

`X` — Input data
`dlarray` | numeric array

`offset` — Offset
`dlarray` | numeric array

`scaleFactor` — Scale factor
`dlarray` | numeric array

Name-Value Arguments

`DataFormat` — Description of data dimensions
character vector | string scalar

`Epsilon` — Constant to add to mini-batch variances
`1e-5` (default) | positive scalar

`ScaleFormat` — Description of scale dimensions
character vector | string scalar

`OffsetFormat` — Description of offset dimensions
character vector | string scalar

`OperationDimension` — Dimension to normalize over
`"auto"` (default) | `"channel-only"` | `"spatial-channel"` | `"batch-excluded"`

Output Arguments

`Y` — Normalized data
`dlarray`

Algorithms

Layer Normalization

Deep Learning Array Formats

References

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2023a: Specify operation dimension

R2023a: `Epsilon` supports values less than `1e-5`

R2023a: Operation normalizes over channel and spatial dimensions of 2-D and 3-D image sequence data

R2023a: Operation normalizes over channel dimension of 1-D image, vector sequence, and 1-D image sequence data

See Also

Topics

layernorm

Syntax

Description

Examples

Apply Layer Normalization

Apply Element-Wise Layer Normalization

Input Arguments

X — Input data dlarray | numeric array

offset — Offset dlarray | numeric array

scaleFactor — Scale factor dlarray | numeric array

Name-Value Arguments

DataFormat — Description of data dimensions character vector | string scalar

Epsilon — Constant to add to mini-batch variances 1e-5 (default) | positive scalar

ScaleFormat — Description of scale dimensions character vector | string scalar

OffsetFormat — Description of offset dimensions character vector | string scalar

OperationDimension — Dimension to normalize over "auto" (default) | "channel-only" | "spatial-channel" | "batch-excluded"

Output Arguments

Y — Normalized data dlarray

Algorithms

Layer Normalization

Deep Learning Array Formats

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2023a: Specify operation dimension

R2023a: Epsilon supports values less than 1e-5

R2023a: Operation normalizes over channel and spatial dimensions of 2-D and 3-D image sequence data

R2023a: Operation normalizes over channel dimension of 1-D image, vector sequence, and 1-D image sequence data

See Also

Topics

`X` — Input data
`dlarray` | numeric array

`offset` — Offset
`dlarray` | numeric array

`scaleFactor` — Scale factor
`dlarray` | numeric array

`DataFormat` — Description of data dimensions
character vector | string scalar

`Epsilon` — Constant to add to mini-batch variances
`1e-5` (default) | positive scalar

`ScaleFormat` — Description of scale dimensions
character vector | string scalar

`OffsetFormat` — Description of offset dimensions
character vector | string scalar

`OperationDimension` — Dimension to normalize over
`"auto"` (default) | `"channel-only"` | `"spatial-channel"` | `"batch-excluded"`

`Y` — Normalized data
`dlarray`

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

R2023a: `Epsilon` supports values less than `1e-5`