Normalize across all observations for each channel independently
The batch normalization operation normalizes the input data
across all observations for each channel independently. To speed up training of the
convolutional neural network and reduce the sensitivity to network initialization, use batch
normalization between convolution and nonlinear operations such as relu
.
After normalization, the operation shifts the input by a learnable offset β and scales it by a learnable scale factor γ.
The batchnorm
function applies the batch normalization operation to
dlarray
data.
Using dlarray
objects makes working with high
dimensional data easier by allowing you to label the dimensions. For example, you can label
which dimensions correspond to spatial, time, channel, and batch dimensions using the
'S'
, 'T'
, 'C'
, and
'B'
labels, respectively. For unspecified and other dimensions, use the
'U'
label. For dlarray
object functions that operate
over particular dimensions, you can specify the dimension labels by formatting the
dlarray
object directly, or by using the 'DataFormat'
option.
Note
To apply batch normalization within a layerGraph
object
or Layer
array, use
batchNormalizationLayer
.
applies the batch normalization operation to the input data dlY
= batchnorm(dlX
,offset
,scaleFactor
)dlX
and
transforms using the specified and offset and scale factor.
The function normalizes over the 'S'
(spatial),
'T'
(time), 'B'
(batch), and
'U'
(unspecified) dimensions of dlX
for each
channel in the 'C'
(channel) dimension, independently.
For unformatted input data, use the 'DataFormat'
option.
[
also returns the population mean and variance of the input data
dlY
,popMu
,popSigmaSq
] = batchnorm(dlX
,offset
,scaleFactor
)dlX
.
[
applies the batch normalization operation using the mean and variance
dlY
,updatedMu
,updatedSigmaSq
] = batchnorm(dlX
,offset
,scaleFactor
,mu
,sigmaSq
)mu
and sigmaSq
, respectively, and also returns
updated moving mean and variance statistics.
Use this syntax to maintain running values for the mean and variance statistics data during training. Use the final updated values of the mean and variance for prediction and classification.
[___] = batchnorm(___,'DataFormat',FMT)
applies the batch normalization operation to unformatted input data with format specified
by FMT
using any of the previous syntaxes. The output
dlY
is an unformatted dlarray
object with
dimensions in the same order as dlX
. For example,
'DataFormat','SSCB'
specifies data for 2-D image input with format
'SSCB'
(spatial, spatial, channel, batch).
[___] = batchnorm(___,
specifies options using one or more name-value pair arguments in addition to the input
arguments in previous syntaxes. For example, Name,Value
)'MeanDecay',0.3
sets the
decay rate of the moving average computation.
The batch normalization operation normalizes the elements xi of the input by first calculating the mean μB and variance σB2 over the spatial, time, and observation dimensions for each channel independently. Then, it calculates the normalized activations as
where ϵ is a constant that improves numerical stability when the variance is very small.
To allow for the possibility that inputs with zero mean and unit variance are not optimal for the operations that follow batch normalization, the batch normalization operation further shifts and scales the activations using the transformation
where the offset β and scale factor γ are learnable parameters that are updated during network training.
To make predictions with the network after training, batch normalization requires a fixed mean and variance to normalize the data. This fixed mean and variance can be calculated from the training data after training, or approximated during training using running statistic computations.
dlarray
| dlconv
| dlfeval
| dlgradient
| fullyconnect
| groupnorm
| layernorm
| relu