Neural Network Training Concepts
This topic is part of the design workflow described in Workflow for Neural Network Design.
This topic describes two different styles of training. In incremental training the weights and biases of the network are updated each time an input is presented to the network. In batch training the weights and biases are only updated after all the inputs are presented. The batch training methods are generally more efficient in the MATLAB® environment, and they are emphasized in the Deep Learning Toolbox™ software, but there some applications where incremental training can be useful, so that paradigm is implemented as well.
Incremental Training with adapt
Incremental training can be applied to both static and dynamic networks, although it is more commonly used with dynamic networks, such as adaptive filters. This section illustrates how incremental training is performed on both static and dynamic networks.
Incremental Training of Static Networks
Consider again the static network used for the first example.
You want to train it incrementally, so that the weights and biases
are updated after each input is presented. In this case you use the
function adapt
, and the inputs
and targets are presented as sequences.
Suppose you want to train the network to create the linear function:
Then for the previous inputs,
the targets would be
For incremental training, you present the inputs and targets as sequences:
P = {[1;2] [2;1] [2;3] [3;1]}; T = {4 5 7 7};
First, set up the network with zero initial weights and biases. Also, set the initial learning rate to zero to show the effect of incremental training.
net = linearlayer(0,0); net = configure(net,P,T); net.IW{1,1} = [0 0]; net.b{1} = 0;
Recall from Simulation with Concurrent Inputs in a Static Network that,
for a static network, the simulation of the network produces the same
outputs whether the inputs are presented as a matrix of concurrent
vectors or as a cell array of sequential
vectors. However, this is not true when training the network. When
you use the adapt
function, if
the inputs are presented as a cell array of sequential vectors, then
the weights are updated as each input is presented (incremental mode).
As shown in the next section, if the inputs are presented as a matrix
of concurrent vectors, then the weights are updated only after all
inputs are presented (batch mode).
You are now ready to train the network incrementally.
[net,a,e,pf] = adapt(net,P,T);
The network outputs remain zero, because the learning rate is zero, and the weights are not updated. The errors are equal to the targets:
a = [0] [0] [0] [0] e = [4] [5] [7] [7]
If you now set the learning rate to 0.1 you can see how the network is adjusted as each input is presented:
net.inputWeights{1,1}.learnParam.lr = 0.1; net.biases{1,1}.learnParam.lr = 0.1; [net,a,e,pf] = adapt(net,P,T); a = [0] [2] [6] [5.8] e = [4] [3] [1] [1.2]
The first output is the same as it was with zero learning rate, because no update is made until the first input is presented. The second output is different, because the weights have been updated. The weights continue to be modified as each error is computed. If the network is capable and the learning rate is set correctly, the error is eventually driven to zero.
Incremental Training with Dynamic Networks
You can also train dynamic networks incrementally. In fact, this would be the most common situation.
To train the network incrementally, present the inputs and targets
as elements of cell arrays. Here are the initial input Pi
and
the inputs P
and targets T
as
elements of cell arrays.
Pi = {1}; P = {2 3 4}; T = {3 5 7};
Take the linear network with one delay at the input, as used in a previous example. Initialize the weights to zero and set the learning rate to 0.1.
net = linearlayer([0 1],0.1); net = configure(net,P,T); net.IW{1,1} = [0 0]; net.biasConnect = 0;
You want to train the network to create the current output by
summing the current and the previous inputs. This is the same input
sequence you used in the previous example with the exception that
you assign the first term in the sequence as the initial condition
for the delay. You can now sequentially train the network using adapt
.
[net,a,e,pf] = adapt(net,P,T,Pi); a = [0] [2.4] [7.98] e = [3] [2.6] [-0.98]
The first output is zero, because the weights have not yet been updated. The weights change at each subsequent time step.
Batch Training
Batch training, in which weights and biases are only updated after all the inputs and targets are presented, can be applied to both static and dynamic networks. Both types of networks are discussed in this section.
Batch Training with Static Networks
Batch training can be done using either adapt
or train
, although train
is
generally the best option, because it typically has access to more
efficient training algorithms. Incremental training is usually done
with adapt
; batch training is
usually done with train
.
For batch training of a static network with adapt
, the input vectors must be placed
in one matrix of concurrent vectors.
P = [1 2 2 3; 2 1 3 1]; T = [4 5 7 7];
Begin with the static network used in previous examples. The learning rate is set to 0.01.
net = linearlayer(0,0.01); net = configure(net,P,T); net.IW{1,1} = [0 0]; net.b{1} = 0;
When you call adapt
, it
invokes trains
(the default adaption
function for the linear network) and learnwh
(the
default learning function for the weights and biases). trains
uses Widrow-Hoff learning.
[net,a,e,pf] = adapt(net,P,T); a = 0 0 0 0 e = 4 5 7 7
Note that the outputs of the network are all zero, because the weights are not updated until all the training set has been presented. If you display the weights, you find
net.IW{1,1} ans = 0.4900 0.4100 net.b{1} ans = 0.2300
This is different from the result after one pass of adapt
with incremental updating.
Now perform the same batch training using train
.
Because the Widrow-Hoff rule can be used in incremental or batch mode,
it can be invoked by adapt
or train
. (There are several algorithms that
can only be used in batch mode (e.g., Levenberg-Marquardt), so these
algorithms can only be invoked by train
.)
For this case, the input vectors can be in a matrix of concurrent
vectors or in a cell array of sequential vectors. Because the network
is static and because train
always
operates in batch mode, train
converts
any cell array of sequential vectors to a matrix of concurrent vectors.
Concurrent mode operation is used whenever possible because it has
a more efficient implementation in MATLAB code:
P = [1 2 2 3; 2 1 3 1]; T = [4 5 7 7];
The network is set up in the same way.
net = linearlayer(0,0.01); net = configure(net,P,T); net.IW{1,1} = [0 0]; net.b{1} = 0;
Now you are ready to train the network. Train it for only one
epoch, because you used only one pass of adapt
.
The default training function for the linear network is trainb
, and the default learning function
for the weights and biases is learnwh
,
so you should get the same results obtained using adapt
in the previous example, where the
default adaption function was trains
.
net.trainParam.epochs = 1; net = train(net,P,T);
If you display the weights after one epoch of training, you find
net.IW{1,1} ans = 0.4900 0.4100 net.b{1} ans = 0.2300
This is the same result as the batch mode training in adapt
. With static networks, the adapt
function can implement incremental
or batch training, depending on the format of the input data. If the
data is presented as a matrix of concurrent vectors, batch training
occurs. If the data is presented as a sequence, incremental training
occurs. This is not true for train
,
which always performs batch training, regardless of the format of
the input.
Batch Training with Dynamic Networks
Training static networks is relatively straightforward. If you
use train
the network is trained
in batch mode and the inputs are converted to concurrent vectors (columns
of a matrix), even if they are originally passed as a sequence (elements
of a cell array). If you use adapt
,
the format of the input determines the method of training. If the
inputs are passed as a sequence, then the network is trained in incremental
mode. If the inputs are passed as concurrent vectors, then batch mode
training is used.
With dynamic networks, batch mode training is typically done
with train
only, especially if
only one training sequence exists. To illustrate this, consider again
the linear network with a delay. Use a learning rate of 0.02 for the
training. (When using a gradient descent algorithm, you typically
use a smaller learning rate for batch mode training than incremental
training, because all the individual gradients are summed before determining
the step change to the weights.)
net = linearlayer([0 1],0.02); net.inputs{1}.size = 1; net.layers{1}.dimensions = 1; net.IW{1,1} = [0 0]; net.biasConnect = 0; net.trainParam.epochs = 1; Pi = {1}; P = {2 3 4}; T = {3 5 6};
You want to train the network with the same sequence used for the incremental training earlier, but this time you want to update the weights only after all the inputs are applied (batch mode). The network is simulated in sequential mode, because the input is a sequence, but the weights are updated in batch mode.
net = train(net,P,T,Pi);
The weights after one epoch of training are
net.IW{1,1} ans = 0.9000 0.6200
These are different weights than you would obtain using incremental training, where the weights would be updated three times during one pass through the training set. For batch training the weights are only updated once in each epoch.
Training Feedback
The showWindow
parameter allows you to specify
whether a training window is visible when you train. The training
window appears by default. Two other parameters, showCommandLine
and show
,
determine whether command-line output is generated and the number
of epochs between command-line feedback during training. For instance,
this code turns off the training window and gives you training status
information every 35 epochs when the network is later trained with train
:
net.trainParam.showWindow = false; net.trainParam.showCommandLine = true; net.trainParam.show= 35;
Sometimes it is convenient to disable all training displays. To do that, turn off both the training window and command-line feedback:
net.trainParam.showWindow = false; net.trainParam.showCommandLine = false;
The training window appears automatically when you train.