Main Content

batchnorm

Normalize across all observations for each channel independently

Description

The batch normalization operation normalizes the input data across all observations for each channel independently. To speed up training of the convolutional neural network and reduce the sensitivity to network initialization, use batch normalization between convolution and nonlinear operations such as relu.

After normalization, the operation shifts the input by a learnable offset β and scales it by a learnable scale factor γ.

The batchnorm function applies the batch normalization operation to dlarray data. Using dlarray objects makes working with high dimensional data easier by allowing you to label the dimensions. For example, you can label which dimensions correspond to spatial, time, channel, and batch dimensions using the 'S', 'T', 'C', and 'B' labels, respectively. For unspecified and other dimensions, use the 'U' label. For dlarray object functions that operate over particular dimensions, you can specify the dimension labels by formatting the dlarray object directly, or by using the 'DataFormat' option.

Note

To apply batch normalization within a layerGraph object or Layer array, use batchNormalizationLayer.

example

dlY = batchnorm(dlX,offset,scaleFactor) applies the batch normalization operation to the input data dlX and transforms using the specified and offset and scale factor.

The function normalizes over the 'S' (spatial), 'T' (time), 'B' (batch), and 'U' (unspecified) dimensions of dlX for each channel in the 'C' (channel) dimension, independently.

For unformatted input data, use the 'DataFormat' option.

[dlY,popMu,popSigmaSq] = batchnorm(dlX,offset,scaleFactor) also returns the population mean and variance of the input data dlX.

dlY = batchnorm(dlX,offset,scaleFactor,mu,sigmaSq) applies the batch normalization operation using the mean and variance mu and sigmaSq, respectively.

example

[dlY,updatedMu,updatedSigmaSq] = batchnorm(dlX,offset,scaleFactor,mu,sigmaSq) applies the batch normalization operation using the mean and variance mu and sigmaSq, respectively, and also returns updated moving mean and variance statistics.

Use this syntax to maintain running values for the mean and variance statistics data during training. Use the final updated values of the mean and variance for prediction and classification.

[___] = batchnorm(___,'DataFormat',FMT) applies the batch normalization operation to unformatted input data with format specified by FMT using any of the previous syntaxes. The output dlY is an unformatted dlarray object with dimensions in the same order as dlX. For example, 'DataFormat','SSCB' specifies data for 2-D image input with format 'SSCB' (spatial, spatial, channel, batch).

[___] = batchnorm(___,Name,Value) specifies options using one or more name-value pair arguments in addition to the input arguments in previous syntaxes. For example, 'MeanDecay',0.3 sets the decay rate of the moving average computation.

Examples

collapse all

Create a formatted dlarray object containing a batch of 128 28-by-28 images with 3 channels. Specify the format 'SSCB' (spatial, spatial, channel, batch).

miniBatchSize = 128;
inputSize = [28 28];
numChannels = 3;
X = rand(inputSize(1),inputSize(2),numChannels,miniBatchSize);
dlX = dlarray(X,'SSCB');

View the size and format of the input data.

size(dlX)
ans = 1×4

    28    28     3   128

dims(dlX)
ans = 
'SSCB'

Initialize the scale and offset for batch normalization. For the scale, specify a vector of ones. For the bias, specify a vector of zeros.

scaleFactor = ones(numChannels,1);
offset = zeros(numChannels,1);

Apply the batch normalization operation using the batchnorm function and return the mini-batch statistics.

[dlY,mu,sigmaSq] = batchnorm(dlX,offset,scaleFactor);

View the size and format of the output dlY.

size(dlY)
ans = 1×4

    28    28     3   128

dims(dlY)
ans = 
'SSCB'

View the mini-batch mean mu.

mu
mu = 3×1

    0.4998
    0.4993
    0.5011

View the mini-batch variance sigmaSq.

sigmaSq
sigmaSq = 3×1

    0.0831
    0.0832
    0.0835

Use the batchnorm function to normalize several batches of data and update the statistics of the whole data set after each normalization.

Create three batches of data. The data consists of 10-by-10 random arrays with five channels. Each batch contains 20 observations. The second and third batches are scaled by a multiplicative factor of 1.5 and 2.5, respectively, so the mean of the data set increases with each batch.

height = 10;
width = 10;
channels = 5;
observations = 20;

X1 = rand(height,width,channels,observations);
dlX1 = dlarray(X1,'SSCB');

X2 = 1.5*rand(height,width,channels,observations);
dlX2 = dlarray(X2,'SSCB');

X3 = 2.5*rand(height,width,channels,observations);
dlX3 = dlarray(X3,'SSCB');

Create the learnable parameters.

offset = zeros(channels,1);
scale = ones(channels,1);

Normalize the first batch of data, dlX1, using batchnorm. Obtain the values of the mean and variance of this batch as outputs.

[dlY1,mu,sigmaSq] = batchnorm(dlX1,offset,scale);

Normalize the second batch of data, dlX2. Use mu and sigmaSq as inputs to obtain the values of the combined mean and variance of the data in batches dlX1 and dlX2.

[dlY2,datasetMu,datasetSigmaSq] = batchnorm(dlX2,offset,scale,mu,sigmaSq);

Normalize the final batch of data, dlX3. Update the data set statistics datasetMu and datasetSigmaSq to obtain the values of the combined mean and variance of all data in batches dlX1, dlX2, and dlX3.

[dlY3,datasetMuFull,datasetSigmaSqFull] = batchnorm(dlX3,offset,scale,datasetMu,datasetSigmaSq);

Observe the change in the mean of each channel as each batch is normalized.

plot([mu';datasetMu';datasetMuFull'])
legend({'Channel 1','Channel 2','Channel 3','Channel 4','Channel 5'},'Location','southeast')
xticks([1 2 3])
xlabel('Number of Batches')
xlim([0.9 3.1])
ylabel('Per-Channel Mean')
title('Data Set Mean')

Figure contains an axes. The axes with title Data Set Mean contains 5 objects of type line. These objects represent Channel 1, Channel 2, Channel 3, Channel 4, Channel 5.

Input Arguments

collapse all

Input data, specified as a formatted dlarray, an unformatted dlarray, or a numeric array.

If dlX is an unformatted dlarray or a numeric array, then you must specify the format using the 'DataFormat' option. If dlX is a numeric array, then either scaleFactor or offset must be a dlarray object.

dlX must have a 'C' (channel) dimension.

Offset β, specified as a formatted dlarray, an unformatted dlarray, or a numeric array with one nonsingleton dimension with size matching the size of the 'C' (channel) dimension of the input dlX.

If offset is a formatted dlarray object, then the nonsingleton dimension must have label 'C' (channel).

Scale factor γ, specified as a formatted dlarray, an unformatted dlarray, or a numeric array with one nonsingleton dimension with size matching the size of the 'C' (channel) dimension of the input dlX.

If scaleFactor is a formatted dlarray object, then the nonsingleton dimension must have label 'C' (channel).

Mean statistic for normalization, specified as a numeric vector of the same length as the 'C' dimension of the input data.

Data Types: single | double

Variance statistic for normalization, specified as a numeric vector of the same length as the 'C' dimension of the input data.

Data Types: single | double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'MeanDecay',0.3,'VarianceDecay',0.5 sets the decay rate for the moving average computations of the mean and variance of several batches of data to 0.3 and 0.5, respectively.

Dimension order of unformatted input data, specified as the comma-separated pair consisting of 'DataFormat' and a character vector or string scalar FMT that provides a label for each dimension of the data.

When specifying the format of a dlarray object, each character provides a label for each dimension of the data and must be one of the following:

  • 'S' — Spatial

  • 'C' — Channel

  • 'B' — Batch (for example, samples and observations)

  • 'T' — Time (for example, time steps of sequences)

  • 'U' — Unspecified

You can specify multiple dimensions labeled 'S' or 'U'. You can use the labels 'C', 'B', and 'T' at most once.

You must specify 'DataFormat' when the input data is not a formatted dlarray.

Example: 'DataFormat','SSCB'

Data Types: char | string

Variance offset for preventing divide-by-zero errors, specified as the comma-separated pair consisting of 'Epsilon' and a numeric scalar. The specified value must be greater than 1e-5. The default value is 1e-5.

Data Types: single | double

Decay value for the moving mean computation, specified as a numeric scalar between 0 and 1.

The function updates the moving mean value using

μ*=λμμ^+(1λμ)μ,

where μ* denotes the updated mean updatedMu, λμ denotes the mean decay value 'MeanDecay', μ^ denotes the mean of the input data, and μ denotes the current value of the mean mu.

Data Types: single | double

Decay value for the moving variance computation, specified as a numeric scalar between 0 and 1.

The function updates the moving variance value using

σ2*=λσ2σ2^+(1λσ2)σ2,

where σ2* denotes the updated variance updatedSigmaSq, λσ2 denotes the variance decay value 'VarianceDecay', σ2^ denotes the variance of the input data, and σ2 denotes the current value of the variance sigmaSq.

Data Types: single | double

Output Arguments

collapse all

Normalized data, returned as a dlarray with the same underlying data type as dlX.

If the input data dlX is a formatted dlarray, then dlY has the same format as dlX. If the input data is not a formatted dlarray, then dlY is an unformatted dlarray with the same dimension order as the input data.

The size of the output dlY matches the size of the input dlX.

Per-channel mean of the input data, returned as a numeric column vector with length equal to the size of the 'C' dimension of the input data.

Per-channel variance of the input data, returned as a numeric column vector with length equal to the size of the 'C' dimension of the input data.

Updated mean statistic, returned as a numeric vector with length equal to the size of the 'C' dimension of the input data.

The function updates the moving mean value using

μ*=λμμ^+(1λμ)μ,

where μ* denotes the updated mean updatedMu, λμ denotes the mean decay value 'MeanDecay', μ^ denotes the mean of the input data, and μ denotes the current value of the mean mu.

Updated variance statistic, returned as a numeric vector with length equal to the size of the 'C' dimension of the input data.

The function updates the moving variance value using

σ2*=λσ2σ2^+(1λσ2)σ2,

where σ2* denotes the updated variance updatedSigmaSq, λσ2 denotes the variance decay value 'VarianceDecay', σ2^ denotes the variance of the input data, and σ2 denotes the current value of the variance sigmaSq.

Algorithms

The batch normalization operation normalizes the elements xi of the input by first calculating the mean μB and variance σB2 over the spatial, time, and observation dimensions for each channel independently. Then, it calculates the normalized activations as

xi^=xiμBσB2+ϵ,

where ϵ is a constant that improves numerical stability when the variance is very small.

To allow for the possibility that inputs with zero mean and unit variance are not optimal for the operations that follow batch normalization, the batch normalization operation further shifts and scales the activations using the transformation

yi=γx^i+β,

where the offset β and scale factor γ are learnable parameters that are updated during network training.

To make predictions with the network after training, batch normalization requires a fixed mean and variance to normalize the data. This fixed mean and variance can be calculated from the training data after training, or approximated during training using running statistic computations.

Extended Capabilities

Introduced in R2019b