Set Up Parameters and Train Convolutional Neural Network
After you define the layers of your neural network as described in Specify Layers of Convolutional Neural Network, the next step is to set up the
training options for the network. Use the trainingOptions
function to define the global training parameters. To train
a network, use the object returned by trainingOptions
as an input
argument to the trainNetwork
function. For example:
options = trainingOptions('adam'); trainedNet = trainNetwork(data,layers,options);
Layers with learnable parameters also have options for adjusting the learning parameters. For more information, see Set Up Parameters in Convolutional and Fully Connected Layers.
Specify Solver and Maximum Number of Epochs
trainNetwork
can use different variants of stochastic gradient
descent to train the network. Specify the optimization algorithm by using the first
input argument of the trainingOptions
function. To minimize the
loss, these algorithms update the network parameters by taking small steps in the
direction of the negative gradient of the loss function.
The 'adam'
(derived from adaptive moment
estimation) solver is often a good optimizer to try first. You can also
try the 'rmsprop'
(root mean square propagation) and
'sgdm'
(stochastic gradient descent with momentum) optimizers and
see if this improves training. Different solvers work better for different tasks. For
more information about the different solvers, see the trainingOptions
function.
The solvers update the parameters using a subset of the data each step. This subset is
called a mini-batch. You can specify the size of the mini-batch by
using the 'MiniBatchSize'
name-value pair argument of
trainingOptions
. Each parameter update is called an
iteration. A full pass through the entire data set is called an
epoch. You can specify the maximum number of epochs to train
for by using the 'MaxEpochs'
name-value pair argument of
trainingOptions
. The default value is 30, but you can choose a
smaller number of epochs for small networks or for fine-tuning and transfer learning,
where most of the learning is already done.
By default, the software shuffles the data once before training. You can change this
setting by using the 'Shuffle'
name-value pair argument.
Specify and Modify Learning Rate
You can specify the global learning rate by using the
'InitialLearnRate'
name-value pair argument of
trainingOptions
. By default, trainNetwork
uses this value throughout the entire training process. You can choose to modify the
learning rate every certain number of epochs by multiplying the learning rate with a
factor. Instead of using a small, fixed learning rate throughout the training process,
you can choose a larger learning rate in the beginning of training and gradually reduce
this value during optimization. Doing so can shorten the training time, while enabling
smaller steps towards the minimum of the loss as training progresses.
Tip
If the mini-batch loss during training ever becomes NaN
, then
the learning rate is likely too high. Try reducing the learning rate, for example by
a factor of 3, and restarting network training.
To gradually reduce the learning rate, use the
'LearnRateSchedule','piecewise'
name-value pair argument. Once
you choose this option, trainNetwork
multiplies the initial
learning rate by a factor of 0.1 every 10 epochs. You can specify the factor by which to
reduce the initial learning rate and the number of epochs by using the
'LearnRateDropFactor'
and
'LearnRateDropPeriod'
name-value pair arguments,
respectively.
Specify Validation Data
To perform network validation during training, specify validation data using the
'ValidationData'
name-value pair argument of
trainingOptions
. By default, trainNetwork
validates the network every 50 iterations by predicting the response of the validation
data and calculating the validation loss and accuracy (root mean squared error for
regression networks). You can change the validation frequency using the
'ValidationFrequency'
name-value pair argument. If your network
has layers that behave differently during prediction than during training (for example,
dropout layers), then the validation accuracy can be higher than the training
(mini-batch) accuracy. You can also use the validation data to stop training
automatically when the validation loss stops decreasing. To turn on automatic validation
stopping, use the 'ValidationPatience'
name-value pair
argument.
Performing validation at regular intervals during training helps you to determine if your network is overfitting to the training data. A common problem is that the network simply "memorizes" the training data, rather than learning general features that enable the network to make accurate predictions for new data. To check if your network is overfitting, compare the training loss and accuracy to the corresponding validation metrics. If the training loss is significantly lower than the validation loss, or the training accuracy is significantly higher than the validation accuracy, then your network is overfitting.
To reduce overfitting, you can try adding data augmentation. Use an augmentedImageDatastore
to perform random transformations on your input
images. This helps to prevent the network from memorizing the exact position and
orientation of objects. You can also try increasing the L2
regularization using the 'L2Regularization'
name-value pair argument,
using batch normalization layers after convolutional layers, and adding dropout
layers.
Select Hardware Resource
If a GPU is available, then trainNetwork
uses it for training, by
default. Otherwise, trainNetwork
uses a CPU. Alternatively, you can
specify the execution environment you want using the
'ExecutionEnvironment'
name-value pair argument. You can specify
a single CPU ('cpu'
), a single GPU ('gpu'
),
multiple GPUs ('multi-gpu'
), or a local parallel pool or compute
cluster ('parallel'
). All options other than 'cpu'
require Parallel Computing Toolbox™. Training on a GPU requires a supported GPU device. For information on
supported devices, see GPU Computing Requirements (Parallel Computing Toolbox).
Save Checkpoint Networks and Resume Training
Deep Learning Toolbox™ enables you to save neural networks as .mat files during training. This
periodic saving is especially useful when you have a large neural network or a large data
set, and training takes a long time. If the training is interrupted for some reason, you can
resume training from the last saved checkpoint neural network. If you want the
trainnet
and trainNetwork
functions to save
checkpoint neural networks, then you must specify the name of the path by using the
CheckpointPath
option of trainingOptions
. If
the path that you specify does not exist, then trainingOptions
returns
an error.
The software automatically assigns unique names to checkpoint neural network files. In the
example name, net_checkpoint__351__2018_04_12__18_09_52.mat
, 351 is the
iteration number, 2018_04_12
is the date, and 18_09_52
is the time at which the software saves the neural network. You can load a checkpoint neural
network file by double-clicking it or using the load command at the command line. For
example:
load net_checkpoint__351__2018_04_12__18_09_52.mat
trainnet
or trainNetwork
. For example:trainNetwork(XTrain,TTrain,net.Layers,options)
Set Up Parameters in Convolutional and Fully Connected Layers
You can set the learning parameters to be different from the global values specified
by trainingOptions
in layers with learnable parameters, such as
convolutional and fully connected layers. For example, to adjust the learning rate for
the biases or weights, you can specify a value for the
BiasLearnRateFactor
or
WeightLearnRateFactor
properties of the layer, respectively.
The trainNetwork
function multiplies the learning rate that you
specify by using trainingOptions
with these factors. Similarly, you
can also specify the L2 regularization factors for the weights
and biases in these layers by specifying the BiasL2Factor
and
WeightL2Factor
properties, respectively.
trainNetwork
then multiplies the L2
regularization factors that you specify by using trainingOptions
with these factors.
Initialize Weights in Convolutional and Fully Connected Layers
The layer weights are learnable parameters. You can specify the initial value of the weights
directly using the Weights
property of the layer. When
you train a network, if the Weights
property of the layer
is nonempty, then the trainnet
and
trainNetwork
functions use the Weights
property as the initial value. If the Weights
property is
empty, then the software uses the initializer specified by the WeightsInitializer
property of the layer.
Train Your Network
After you specify the layers of your network and the training parameters, you can
train the network using the training data. The data, layers, and training options are
all input arguments of the trainNetwork
function, as in this
example.
layers = [imageInputLayer([28 28 1]) convolution2dLayer(5,20) reluLayer maxPooling2dLayer(2,'Stride',2) fullyConnectedLayer(10) softmaxLayer classificationLayer]; options = trainingOptions('adam'); convnet = trainNetwork(data,layers,options);
Training data can be an array, a table, or an ImageDatastore
object.
For more information, see the trainNetwork
function reference
page.
See Also
trainingOptions
| trainNetwork
| Convolution2dLayer
| FullyConnectedLayer