# batchnorm

Normalize across all observations for each channel independently

## Syntax

``dlY = batchnorm(dlX,offset,scaleFactor)``
``[dlY,popMu,popSigmaSq] = batchnorm(dlX,offset,scaleFactor)``
``dlY = batchnorm(dlX,offset,scaleFactor,mu,sigmaSq)``
``[dlY,updatedMu,updatedSigmaSq] = batchnorm(dlX,offset,scaleFactor,mu,sigmaSq)``
``[___] = batchnorm(___,'DataFormat',FMT)``
``[___] = batchnorm(___,Name,Value)``

## Description

The batch normalization operation normalizes the input data across all observations for each channel independently. To speed up training of the convolutional neural network and reduce the sensitivity to network initialization, use batch normalization between convolution and nonlinear operations such as `relu`.

After normalization, the operation shifts the input by a learnable offset β and scales it by a learnable scale factor γ.

The `batchnorm` function applies the batch normalization operation to `dlarray` data. Using `dlarray` objects makes working with high dimensional data easier by allowing you to label the dimensions. For example, you can label which dimensions correspond to spatial, time, channel, and batch dimensions using the `'S'`, `'T'`, `'C'`, and `'B'` labels, respectively. For unspecified and other dimensions, use the `'U'` label. For `dlarray` object functions that operate over particular dimensions, you can specify the dimension labels by formatting the `dlarray` object directly, or by using the `'DataFormat'` option.

Note

To apply batch normalization within a `layerGraph` object or `Layer` array, use `batchNormalizationLayer`.

example

````dlY = batchnorm(dlX,offset,scaleFactor)` applies the batch normalization operation to the input data `dlX` and transforms using the specified and offset and scale factor.The function normalizes over the `'S'` (spatial), `'T'` (time), `'B'` (batch), and `'U'` (unspecified) dimensions of `dlX` for each channel in the `'C'` (channel) dimension, independently.For unformatted input data, use the `'DataFormat'` option.```
````[dlY,popMu,popSigmaSq] = batchnorm(dlX,offset,scaleFactor)` also returns the population mean and variance of the input data `dlX`.```
````dlY = batchnorm(dlX,offset,scaleFactor,mu,sigmaSq)` applies the batch normalization operation using the mean and variance `mu` and `sigmaSq`, respectively.```

example

````[dlY,updatedMu,updatedSigmaSq] = batchnorm(dlX,offset,scaleFactor,mu,sigmaSq)` applies the batch normalization operation using the mean and variance `mu` and `sigmaSq`, respectively, and also returns updated moving mean and variance statistics.Use this syntax to maintain running values for the mean and variance statistics data during training. Use the final updated values of the mean and variance for prediction and classification.```
````[___] = batchnorm(___,'DataFormat',FMT)` applies the batch normalization operation to unformatted input data with format specified by `FMT` using any of the previous syntaxes. The output `dlY` is an unformatted `dlarray` object with dimensions in the same order as `dlX`. For example, `'DataFormat','SSCB'` specifies data for 2-D image input with format `'SSCB'` (spatial, spatial, channel, batch).```
````[___] = batchnorm(___,Name,Value)` specifies options using one or more name-value pair arguments in addition to the input arguments in previous syntaxes. For example, `'MeanDecay',0.3` sets the decay rate of the moving average computation. ```

## Examples

collapse all

Create a formatted `dlarray` object containing a batch of 128 28-by-28 images with 3 channels. Specify the format `'SSCB'` (spatial, spatial, channel, batch).

```miniBatchSize = 128; inputSize = [28 28]; numChannels = 3; X = rand(inputSize(1),inputSize(2),numChannels,miniBatchSize); dlX = dlarray(X,'SSCB');```

View the size and format of the input data.

`size(dlX)`
```ans = 1×4 28 28 3 128 ```
`dims(dlX)`
```ans = 'SSCB' ```

Initialize the scale and offset for batch normalization. For the scale, specify a vector of ones. For the bias, specify a vector of zeros.

```scaleFactor = ones(numChannels,1); offset = zeros(numChannels,1);```

Apply the batch normalization operation using the `batchnorm` function and return the mini-batch statistics.

`[dlY,mu,sigmaSq] = batchnorm(dlX,offset,scaleFactor);`

View the size and format of the output `dlY`.

`size(dlY)`
```ans = 1×4 28 28 3 128 ```
`dims(dlY)`
```ans = 'SSCB' ```

View the mini-batch mean `mu`.

`mu`
```mu = 3×1 0.4998 0.4993 0.5011 ```

View the mini-batch variance `sigmaSq`.

`sigmaSq`
```sigmaSq = 3×1 0.0831 0.0832 0.0835 ```

Use the `batchnorm` function to normalize several batches of data and update the statistics of the whole data set after each normalization.

Create three batches of data. The data consists of 10-by-10 random arrays with five channels. Each batch contains 20 observations. The second and third batches are scaled by a multiplicative factor of `1.5` and `2.5`, respectively, so the mean of the data set increases with each batch.

```height = 10; width = 10; channels = 5; observations = 20; X1 = rand(height,width,channels,observations); dlX1 = dlarray(X1,'SSCB'); X2 = 1.5*rand(height,width,channels,observations); dlX2 = dlarray(X2,'SSCB'); X3 = 2.5*rand(height,width,channels,observations); dlX3 = dlarray(X3,'SSCB');```

Create the learnable parameters.

```offset = zeros(channels,1); scale = ones(channels,1);```

Normalize the first batch of data, dlX1, using `batchnorm`. Obtain the values of the mean and variance of this batch as outputs.

`[dlY1,mu,sigmaSq] = batchnorm(dlX1,offset,scale);`

Normalize the second batch of data, `dlX2`. Use `mu` and `sigmaSq` as inputs to obtain the values of the combined mean and variance of the data in batches `dlX1` and `dlX2`.

`[dlY2,datasetMu,datasetSigmaSq] = batchnorm(dlX2,offset,scale,mu,sigmaSq);`

Normalize the final batch of data, `dlX3`. Update the data set statistics `datasetMu` and `datasetSigmaSq` to obtain the values of the combined mean and variance of all data in batches `dlX1`, `dlX2`, and `dlX3`.

`[dlY3,datasetMuFull,datasetSigmaSqFull] = batchnorm(dlX3,offset,scale,datasetMu,datasetSigmaSq);`

Observe the change in the mean of each channel as each batch is normalized.

```plot([mu';datasetMu';datasetMuFull']) legend({'Channel 1','Channel 2','Channel 3','Channel 4','Channel 5'},'Location','southeast') xticks([1 2 3]) xlabel('Number of Batches') xlim([0.9 3.1]) ylabel('Per-Channel Mean') title('Data Set Mean')```

## Input Arguments

collapse all

Input data, specified as a formatted `dlarray`, an unformatted `dlarray`, or a numeric array.

If `dlX` is an unformatted `dlarray` or a numeric array, then you must specify the format using the `'DataFormat'` option. If `dlX` is a numeric array, then either `scaleFactor` or `offset` must be a `dlarray` object.

`dlX` must have a `'C'` (channel) dimension.

Offset β, specified as a formatted `dlarray`, an unformatted `dlarray`, or a numeric array with one nonsingleton dimension with size matching the size of the `'C'` (channel) dimension of the input `dlX`.

If `offset` is a formatted `dlarray` object, then the nonsingleton dimension must have label `'C'` (channel).

Scale factor γ, specified as a formatted `dlarray`, an unformatted `dlarray`, or a numeric array with one nonsingleton dimension with size matching the size of the `'C'` (channel) dimension of the input `dlX`.

If `scaleFactor` is a formatted `dlarray` object, then the nonsingleton dimension must have label `'C'` (channel).

Mean statistic for normalization, specified as a numeric vector of the same length as the `'C'` dimension of the input data.

Data Types: `single` | `double`

Variance statistic for normalization, specified as a numeric vector of the same length as the `'C'` dimension of the input data.

Data Types: `single` | `double`

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `'MeanDecay',0.3,'VarianceDecay',0.5` sets the decay rate for the moving average computations of the mean and variance of several batches of data to `0.3` and `0.5`, respectively.

Dimension order of unformatted input data, specified as the comma-separated pair consisting of `'DataFormat'` and a character vector or string scalar `FMT` that provides a label for each dimension of the data.

When specifying the format of a `dlarray` object, each character provides a label for each dimension of the data and must be one of the following:

• `'S'` — Spatial

• `'C'` — Channel

• `'B'` — Batch (for example, samples and observations)

• `'T'` — Time (for example, time steps of sequences)

• `'U'` — Unspecified

You can specify multiple dimensions labeled `'S'` or `'U'`. You can use the labels `'C'`, `'B'`, and `'T'` at most once.

You must specify `'DataFormat'` when the input data is not a formatted `dlarray`.

Example: `'DataFormat','SSCB'`

Data Types: `char` | `string`

Variance offset for preventing divide-by-zero errors, specified as the comma-separated pair consisting of `'Epsilon'` and a numeric scalar. The specified value must be greater than `1e-5`. The default value is `1e-5`.

Data Types: `single` | `double`

Decay value for the moving mean computation, specified as a numeric scalar between `0` and `1`.

The function updates the moving mean value using

`${\mu }^{*}={\lambda }_{\mu }\stackrel{^}{\mu }+\left(1-{\lambda }_{\mu }\right)\mu ,$`

where ${\mu }^{*}$ denotes the updated mean `updatedMu`, ${\lambda }_{\mu }$ denotes the mean decay value `'MeanDecay'`, $\stackrel{^}{\mu }$ denotes the mean of the input data, and $\mu$ denotes the current value of the mean `mu`.

Data Types: `single` | `double`

Decay value for the moving variance computation, specified as a numeric scalar between `0` and `1`.

The function updates the moving variance value using

`${\sigma }^{2}{}^{*}={\lambda }_{{\sigma }^{2}}\stackrel{^}{{\sigma }^{2}}+\left(1-{\lambda }_{{\sigma }^{2}}\right){\sigma }^{2},$`

where ${\sigma }^{2}{}^{*}$ denotes the updated variance `updatedSigmaSq`, ${\lambda }_{{\sigma }^{2}}$ denotes the variance decay value `'VarianceDecay'`, $\stackrel{^}{{\sigma }^{2}}$ denotes the variance of the input data, and ${\sigma }^{2}$ denotes the current value of the variance `sigmaSq`.

Data Types: `single` | `double`

## Output Arguments

collapse all

Normalized data, returned as a `dlarray` with the same underlying data type as `dlX`.

If the input data `dlX` is a formatted `dlarray`, then `dlY` has the same format as `dlX`. If the input data is not a formatted `dlarray`, then `dlY` is an unformatted `dlarray` with the same dimension order as the input data.

The size of the output `dlY` matches the size of the input `dlX`.

Per-channel mean of the input data, returned as a numeric column vector with length equal to the size of the `'C'` dimension of the input data.

Per-channel variance of the input data, returned as a numeric column vector with length equal to the size of the `'C'` dimension of the input data.

Updated mean statistic, returned as a numeric vector with length equal to the size of the `'C'` dimension of the input data.

The function updates the moving mean value using

`${\mu }^{*}={\lambda }_{\mu }\stackrel{^}{\mu }+\left(1-{\lambda }_{\mu }\right)\mu ,$`

where ${\mu }^{*}$ denotes the updated mean `updatedMu`, ${\lambda }_{\mu }$ denotes the mean decay value `'MeanDecay'`, $\stackrel{^}{\mu }$ denotes the mean of the input data, and $\mu$ denotes the current value of the mean `mu`.

Updated variance statistic, returned as a numeric vector with length equal to the size of the `'C'` dimension of the input data.

The function updates the moving variance value using

`${\sigma }^{2}{}^{*}={\lambda }_{{\sigma }^{2}}\stackrel{^}{{\sigma }^{2}}+\left(1-{\lambda }_{{\sigma }^{2}}\right){\sigma }^{2},$`

where ${\sigma }^{2}{}^{*}$ denotes the updated variance `updatedSigmaSq`, ${\lambda }_{{\sigma }^{2}}$ denotes the variance decay value `'VarianceDecay'`, $\stackrel{^}{{\sigma }^{2}}$ denotes the variance of the input data, and ${\sigma }^{2}$ denotes the current value of the variance `sigmaSq`.

## Algorithms

The batch normalization operation normalizes the elements xi of the input by first calculating the mean μB and variance σB2 over the spatial, time, and observation dimensions for each channel independently. Then, it calculates the normalized activations as

`$\stackrel{^}{{x}_{i}}=\frac{{x}_{i}-{\mu }_{B}}{\sqrt{{\sigma }_{B}^{2}+ϵ}},$`

where ϵ is a constant that improves numerical stability when the variance is very small.

To allow for the possibility that inputs with zero mean and unit variance are not optimal for the operations that follow batch normalization, the batch normalization operation further shifts and scales the activations using the transformation

`${y}_{i}=\gamma {\stackrel{^}{x}}_{i}+\beta ,$`

where the offset β and scale factor γ are learnable parameters that are updated during network training.

To make predictions with the network after training, batch normalization requires a fixed mean and variance to normalize the data. This fixed mean and variance can be calculated from the training data after training, or approximated during training using running statistic computations.

## Extended Capabilities

Introduced in R2019b