resnetLayers
Description
creates a 2D residual network with an image input size specified by
lgraph
= resnetLayers(inputSize
,numClasses
)inputSize
and a number of classes specified by
numClasses
. A residual network consists of stacks of blocks. Each
block contains deep learning layers. The network includes an image classification layer,
suitable for predicting the categorical label of an input image.
To create a 3D residual network, use resnet3dLayers
.
creates a residual network using one or more namevalue arguments using any of the input
arguments in the previous syntax. For example, lgraph
= resnetLayers(___,Name=Value
)InitialNumFilters=32
specifies 32 filters in the initial convolutional layer.
Examples
Residual Network with Bottleneck
Create a residual network with a bottleneck architecture.
imageSize = [224 224 3]; numClasses = 10; lgraph = resnetLayers(imageSize,numClasses)
lgraph = LayerGraph with properties: InputNames: {'input'} OutputNames: {'output'} Layers: [177x1 nnet.cnn.layer.Layer] Connections: [192x2 table]
Analyze the network.
analyzeNetwork(lgraph)
This network is equivalent to a ResNet50 residual network.
Residual Network with Custom Stack Depth
Create a ResNet101 network using a custom stack depth.
imageSize = [224 224 3]; numClasses = 10; stackDepth = [3 4 23 3]; numFilters = [64 128 256 512]; lgraph = resnetLayers(imageSize,numClasses, ... StackDepth=stackDepth, ... NumFilters=numFilters)
lgraph = LayerGraph with properties: InputNames: {'input'} OutputNames: {'output'} Layers: [347x1 nnet.cnn.layer.Layer] Connections: [379x2 table]
Analyze the network.
analyzeNetwork(lgraph)
Train Residual Network
Create and train a residual network to classify images.
Load the digits data as inmemory numeric arrays using the digitTrain4DArrayData
and digitTest4DArrayData
functions.
[XTrain,YTrain] = digitTrain4DArrayData; [XTest,YTest] = digitTest4DArrayData;
Define the residual network. The digits data contains 28by28 pixel images, therefore, construct a residual network with smaller filters.
imageSize = [28 28 1]; numClasses = 10; lgraph = resnetLayers(imageSize,numClasses, ... InitialStride=1, ... InitialFilterSize=3, ... InitialNumFilters=16, ... StackDepth=[4 3 2], ... NumFilters=[16 32 64]);
Set the options to the default settings for the stochastic gradient descent with momentum. Set the maximum number of epochs at 5, and start the training with an initial learning rate of 0.1.
options = trainingOptions("sgdm", ... MaxEpochs=5, ... InitialLearnRate=0.1, ... Verbose=false, ... Plots="trainingprogress");
Train the network.
net = trainNetwork(XTrain,YTrain,lgraph,options);
Test the performance of the network by evaluating the prediction accuracy of the test data. Use the classify
function to predict the class label of each test image.
YPred = classify(net,XTest);
Calculate the accuracy. The accuracy is the fraction of labels that the network predicts correctly.
accuracy = sum(YPred == YTest)/numel(YTest)
accuracy = 0.9956
Convert Residual Network to dlnetwork
Object
To train a residual network using a custom training loop, first convert it to a dlnetwork
object.
Create a residual network.
lgraph = resnetLayers([224 224 3],5);
Remove the classification layer.
lgraph = removeLayers(lgraph,"output");
Replace the input layer with a new input layer that has Normalization
set to "none"
. To use an input layer with zerocenter or zscore normalization, you must specify an imageInputLayer
with nonempty value for the Mean
property. For example, Mean=sum(XTrain,4)
, where XTrain
is a 4D array containing your input data.
newInputLayer = imageInputLayer([224 224 3],Normalization="none"); lgraph = replaceLayer(lgraph,"input",newInputLayer);
Convert to a dlnetwork
.
dlnet = dlnetwork(lgraph)
dlnet = dlnetwork with properties: Layers: [176x1 nnet.cnn.layer.Layer] Connections: [191x2 table] Learnables: [214x3 table] State: [106x3 table] InputNames: {'imageinput'} OutputNames: {'softmax'} Initialized: 1 View summary with summary.
Input Arguments
inputSize
— Network input image size
2element vector  3element vector
Network input image size, specified as one of the following:
2element vector in the form [height, width].
3element vector in the form [height, width, depth], where depth is the number of channels. Set depth to
3
for RGB images and to1
for grayscale images. For multispectral and hyperspectral images, set depth to the number of channels.
The height and width values must be greater
than or equal to initialStride * poolingStride *
2^{D}, where D is the number of
downsampling blocks. Set the initial stride using the InitialStride
argument. The pooling stride is 1
when the
InitialPoolingLayer
is set to "none"
, and
2
otherwise.
Data Types: single
 double
 int8
 int16
 int32
 int64
 uint8
 uint16
 uint32
 uint64
numClasses
— Number of classes
integer greater than 1
Number of classes in the image classification network, specified as an integer greater than 1.
Data Types: single
 double
 int8
 int16
 int32
 int64
 uint8
 uint16
 uint32
 uint64
NameValue Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Namevalue arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: InitialFilterSize=[5,5],InitialNumFilters=32,BottleneckType="none"
specifies an initial filter size of 5by5 pixels, 32 initial filters, and a network
architecture without bottleneck components.
InitialFilterSize
— Filter size in first convolutional layer
7
(default)  positive integer  2element vector of positive integers
Filter size in the first convolutional layer, specified as one of the following:
Positive integer. The filter has equal height and width. For example, specifying
5
yields a filter of height 5 and width 5.2element vector in the form [height, width]. For example, specifying an initial filter size of
[1 5]
yields a filter of height 1 and width 5.
Example: InitialFilterSize=[5,5]
Data Types: single
 double
 int8
 int16
 int32
 int64
 uint8
 uint16
 uint32
 uint64
InitialNumFilters
— Number of filters in first convolutional layer
64
(default)  positive integer
Number of filters in the first convolutional layer, specified as a positive integer. The number of initial filters determines the number of channels (feature maps) in the output of the first convolutional layer in the residual network.
Example: InitialNumFilters=32
Data Types: single
 double
 int8
 int16
 int32
 int64
 uint8
 uint16
 uint32
 uint64
InitialStride
— Stride in first convolutional layer
2
(default)  positive integer  2element vector of positive integers
Stride in the first convolutional layer, specified as a:
Positive integer. The stride has equal height and width. For example, specifying
3
yields a stride of height 3 and width 3.2element vector in the form [height, width]. For example, specifying an initial stride of
[1 2]
yields a stride of height 1 and width 2.
The stride defines the step size for traversing the input vertically and horizontally.
Example: InitialStride=[3,3]
Data Types: single
 double
 int8
 int16
 int32
 int64
 uint8
 uint16
 uint32
 uint64
InitialPoolingLayer
— First pooling layer
"max"
(default)  "average"
 "none"
First pooling layer before the initial residual block, specified as one of the following:
"max"
— Use a max pooling layer before the initial residual block. For more information, seemaxPooling2dLayer
."average"
— Use an average pooling layer before the initial residual block. For more information, seeaveragePooling2dLayer
."none"
— Do not use a pooling layer before the initial residual block.
Example: InitialPoolingLayer="average"
Data Types: char
 string
ResidualBlockType
— Residual block type
"batchnormbeforeadd"
(default)  "batchnormafteradd"
Residual block type, specified as one of the following:
The ResidualBlockType
argument specifies the location of the
batch normalization layer in the standard and downsampling residual blocks. For more
information, see More About.
Example: ResidualBlockType="batchnormafteradd"
Data Types: char
 string
BottleneckType
— Block bottleneck type
"downsamplefirstconv"
(default)  "none"
Block bottleneck type, specified as one of the following:
"downsamplefirstconv"
— Use bottleneck residual blocks that perform downsampling in the first convolutional layer of the downsampling residual blocks, using a stride of 2. A bottleneck residual block consists of three convolutional layers: a 1by1 layer for downsampling the channel dimension, a 3by3 convolutional layer, and a 1by1 layer for upsampling the channel dimension.The number of filters in the final convolutional layer is four times that in the first two convolutional layers. For more information, see
NumFilters
."none"
— Do not use bottleneck residual blocks. The residual blocks consist of two 3by3 convolutional layers.
A bottleneck block performs a 1by1 convolution before the 3by3 convolution to reduce the number of channels by a factor of four. Networks with and without bottleneck blocks will have a similar level of computational complexity, but the total number of features propagating in the residual connections is four times larger when you use bottleneck units. Therefore, using a bottleneck increases the efficiency of the network [1]. For more information on the layers in each residual block, see More About.
Example: BottleneckType="none"
Data Types: char
 string
StackDepth
— Number of residual blocks in each stack
[3 4 6 3]
(default)  vector of positive integers
Number of residual blocks in each stack, specified as a vector of positive
integers. For example, if the stack depth is [3 4 6 3]
, the network
has four stacks, with three blocks, four blocks, six blocks, and three blocks.
Specify the number of filters in the convolutional layers of each stack using the
NumFilters
argument. The StackDepth
value must
have the same number of elements as the NumFilters
value.
Example: StackDepth=[9 12 69 9]
Data Types: single
 double
 int8
 int16
 int32
 int64
 uint8
 uint16
 uint32
 uint64
NumFilters
— Number of filters in convolutional layers of each stack
[64 128 256 512]
(default)  vector of positive integers
Number of filters in the convolutional layers of each stack, specified as a vector of positive integers.
When you set
BottleneckType
to"downsamplefirstconv"
, the first two convolutional layers in each block of each stack have the same number of filters, set by theNumFilters
value. The final convolutional layer has four times the number of filters in the first two convolutional layers.For example, suppose you set
NumFilters
to[4 5]
andBottleneckType
to"downsamplefirstconv"
. In the first stack, the first two convolutional layers in each block have 4 filters and the final convolutional layer in each block has 16 filters. In the second stack, the first two convolutional layers in each block have 5 filters and the final convolutional layer has 20 filters.When you set
BottleneckType
to"none"
, the convolutional layers in each stack have the same number of filters, set by theNumFilters
value.
The NumFilters
value must have the same number of elements as
the StackDepth
value.
The NumFilters
value determines the layers on the residual
connection in the initial residual block. There is a convolutional layer on the
residual connection if one of the following conditions is met:
BottleneckType="downsamplefirstconv"
(default) andInitialNumFilters
is not equal to four times the first element ofNumFilters
.BottleneckType="none"
andInitialNumFilters
is not equal to the first element ofNumFilters
.
For more information about the layers in each residual block, see More About.
Example: NumFilters=[32 64 126 256]
Data Types: single
 double
 int8
 int16
 int32
 int64
 uint8
 uint16
 uint32
 uint64
Normalization
— Data normalization
"zerocenter"
(default)  "zscore"
Data normalization to apply every time data is forwardpropagated through the input layer, specified as one of the following:
"zerocenter"
— Subtract the mean. The mean is calculated at training time."zscore"
— Subtract the mean and divide by the standard deviation. The mean and standard deviation are calculated at training time.
Example: Normalization="zscore"
Data Types: char
 string
Output Arguments
lgraph
— Residual network
layerGraph
object
Residual network, returned as a layerGraph
object.
More About
Residual Network
Residual networks (ResNets) are a type of deep network consisting of building blocks that have residual connections (also known as skip or shortcut connections). These connections allow the input to skip the convolutional units of the main branch, thus providing a simpler path through the network. By allowing the parameter gradients to flow more easily from the output layer to the earlier layers of the network, residual connections help mitigate the problem of vanishing gradients during early training.
The structure of a residual network is flexible. The key component is the inclusion of the residual connections within residual blocks. A group of residual blocks is called a stack. A ResNet architecture consists of initial layers, followed by stacks containing residual blocks, and then the final layers. A network has three types of residual blocks:
Initial residual block — This block occurs at the start of the first stack. The layers in the residual connection of the initial residual block determine if the block preserves the activation sizes or performs downsampling.
Standard residual block — This block occurs multiple times in each stack, after the first downsampling residual block. The standard residual block preserves the activation sizes.
Downsampling residual block — This block occurs once, at the start of each stack. The first convolutional unit in the downsampling block downsamples the spatial dimensions by a factor of two.
A typical stack has a downsampling residual block, followed by m
standard residual blocks, where m
is greater than or equal to one. The first stack is the only stack that begins with an initial residual block.
The initial, standard, and downsampling residual blocks can be bottleneck or nonbottleneck blocks. Bottleneck blocks perform a 1by1 convolution before the 3by3 convolution, to reduce the number of channels by a factor of four. Networks with and without bottleneck blocks have a similar level of computational complexity, but the total number of features propagating in the residual connections is four times larger when you use the bottleneck units. Therefore, using bottleneck blocks increases the efficiency of the network.
The layers inside each block are determined by the type of block and the options you set.
Block Layers
Name  Initial Layers  Initial Residual Block  Standard Residual Block
(BottleneckType="downsamplefirstconv" )  Standard Residual Block
(BottleneckType="none" )  Downsampling Residual Block  Final Layers 
Description  A residual network starts with the following layers, in order:
Set the optional pooling layer using the
 The main branch of the initial residual block has the same layers as a standard residual block. The
If  The standard residual block with bottleneck units has the following layers, in order:
The standard block has a residual connection from the output of the previous block to the addition layer. Set the
position of the addition layer using the  The standard residual block without bottleneck units has the following layers, in order:
The standard block has a residual connection from the output of the previous block to the addition layer. Set the position of the
addition layer using the  The downsampling residual block is the same as the standard block
(either with or without the bottleneck) but with a stride of
The layers on the residual
connection depend on the
The downsampling block halves the height and width of the input, and increases the number of channels.  A residual network ends with the following layers, in order:

Example Visualization 
 Example of an initial residual block for a network without a bottleneck and with the batch normalization layer before the addition layer.
 Example of the standard residual block for a network with a bottleneck and with the batch normalization layer before the addition layer.
 Example of the standard residual block for a network without a bottleneck and with the batch normalization layer before the addition layer.
 Example of a downsampling residual block for a network without a bottleneck and with the batch normalization layer before the addition layer.


The convolution and fully connected layer weights are initialized using the He weight
initialization method [3]. For more information, see
convolution2dLayer
.
Tips
When working with small images, set the
InitialPoolingLayer
option to"none"
to remove the initial pooling layer and reduce the amount of downsampling.Residual networks are usually named ResNetX, where X is the depth of the network. The depth of a network is defined as the largest number of sequential convolutional or fully connected layers on a path from the input layer to the output layer. You can use the following formula to compute the depth of your network:
$$\text{depth=}\{\begin{array}{c}1+2{\displaystyle \sum}_{i=1}^{N}{s}_{i}+1\text{Ifnobottleneck}\\ 1+3{\displaystyle \sum}_{i=1}^{N}{s}_{i}+1\text{Ifbottleneck}\end{array}\text{,}$$
where s_{i} is the depth of stack i.
Networks with the same depth can have different network architectures. For example, you can create a ResNet14 architecture with or without a bottleneck:
The relationship between bottleneck and nonbottleneck architectures also means that a network with a bottleneck will have a different depth than a network without a bottleneck.resnet14Bottleneck = resnetLayers([224 224 3],10, ... StackDepth=[2 2], ... NumFilters=[64 128]); resnet14NoBottleneck = resnetLayers([224 224 3],10, ... BottleneckType="none", ... StackDepth=[2 2 2], ... NumFilters=[64 128 256]);
resnet50Bottleneck = resnetLayers([224 224 3],10); resnet34NoBottleneck = resnetLayers([224 224 3],10, ... BottleneckType="none");
References
[1] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep Residual Learning for Image Recognition.” Preprint, submitted December 10, 2015. https://arxiv.org/abs/1512.03385.
[2] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Identity Mappings in Deep Residual Networks.” Preprint, submitted July 25, 2016. https://arxiv.org/abs/1603.05027.
[3] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Delving Deep into Rectifiers: Surpassing HumanLevel Performance on ImageNet Classification." In Proceedings of the 2015 IEEE International Conference on Computer Vision, 1026–1034. Washington, DC: IEEE Computer Vision Society, 2015.
Extended Capabilities
GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.
Usage notes and limitations:
You can use the residual network for code generation. First, create the network using
the resnetLayers
function. Then, use the trainNetwork
function to train the network. After training and evaluating the network, you can generate
code for the DAGNetwork
object by using GPU Coder™.
Version History
Introduced in R2021b
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
 América Latina (Español)
 Canada (English)
 United States (English)
Europe
 Belgium (English)
 Denmark (English)
 Deutschland (Deutsch)
 España (Español)
 Finland (English)
 France (Français)
 Ireland (English)
 Italia (Italiano)
 Luxembourg (English)
 Netherlands (English)
 Norway (English)
 Österreich (Deutsch)
 Portugal (English)
 Sweden (English)
 Switzerland
 United Kingdom (English)