Main Content


Create SegNet layers for semantic segmentation



lgraph = segnetLayers(imageSize,numClasses,model) returns SegNet layers, lgraph, that is preinitialized with layers and weights from a pretrained model.

SegNet is a convolutional neural network for semantic image segmentation. The network uses a pixelClassificationLayer to predict the categorical label for every pixel in an input image.

Use segnetLayers to create the network architecture for SegNet. You must train the network using the Deep Learning Toolbox™ function trainNetwork (Deep Learning Toolbox).

lgraph = segnetLayers(imageSize,numClasses,encoderDepth) returns uninitialized SegNet layers configured using the specified encoder depth.

lgraph = segnetLayers(imageSize,numClasses,encoderDepth,Name,Value) returns a SegNet layer with additional options specified by one or more Name,Value pair arguments.


collapse all

Load training images and pixel labels.

dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageDir = fullfile(dataSetDir,'trainingImages');
labelDir = fullfile(dataSetDir,'trainingLabels');

Create an image datastore holding the training images.

imds = imageDatastore(imageDir);

Define the class names and their associated label IDs.

classNames = ["triangle", "background"];
labelIDs   = [255 0];

Create a pixel label datastore holding the ground truth pixel labels for the training images.

pxds = pixelLabelDatastore(labelDir,classNames,labelIDs);

Create SegNet layers.

imageSize = [32 32];
numClasses = 2;
lgraph = segnetLayers(imageSize,numClasses,2)
lgraph = 
  LayerGraph with properties:

         Layers: [31x1 nnet.cnn.layer.Layer]
    Connections: [34x2 table]
     InputNames: {'inputImage'}
    OutputNames: {'pixelLabels'}

Create a pixel label image datastore for training a semantic segmentation network.

pximds = pixelLabelImageDatastore(imds,pxds);

Set up training options.

options = trainingOptions('sgdm','InitialLearnRate',1e-3, ...

Train the network.

net = trainNetwork(pximds,lgraph,options)
Training on single CPU.
Initializing input data normalization.
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |     Loss     |      Rate       |
|       1 |           1 |       00:00:03 |       42.11% |       0.7662 |          0.0010 |
|      10 |          10 |       00:00:26 |       50.77% |       0.7390 |          0.0010 |
|      20 |          20 |       00:00:53 |       66.19% |       0.6918 |          0.0010 |
net = 
  DAGNetwork with properties:

         Layers: [31x1 nnet.cnn.layer.Layer]
    Connections: [34x2 table]
     InputNames: {'inputImage'}
    OutputNames: {'pixelLabels'}

Display the network.


Create SegNet layers with an encoder/decoder depth of 4.

imageSize = [480 640 3];
numClasses = 5;
encoderDepth = 4;
lgraph = segnetLayers(imageSize,numClasses,encoderDepth)
lgraph = 
  LayerGraph with properties:

         Layers: [59x1 nnet.cnn.layer.Layer]
    Connections: [66x2 table]
     InputNames: {'inputImage'}
    OutputNames: {'pixelLabels'}

Display network.


Input Arguments

collapse all

Network input image size, specified as a:

  • 2-element vector in the format [height, width].

  • 3-element vector in the format [height, width, depth]. depth is the number of image channels. Set depth to 3 for RGB images, 1 for grayscale images, or to the number of channels for multispectral and hyperspectral images.

Number of classes in the semantic segmentation, specified as an integer greater than 1.

Pretrained network model, specified as 'vgg16' or 'vgg19'. These models have an encoder depth of 5. When you use a 'vgg16' model, you must specify RGB inputs. You can convert grayscale images to RGB using the rgb2gray function.

Encoder depth, specified as a positive integer.

SegNet is composed of an encoder and corresponding decoder subnetwork. The depth of these networks determines the number of times the input image is downsampled or upsampled as it is processed. The encoder network downsamples the input image by a factor of 2D, where D is the value of encoderDepth. The decoder network upsamples the encoder network output by a factor of 2D.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'NumConvolutionLayers',1

Number of convolutional layers in each encoder and decoder section, specified as a positive integer or vector of positive integers.

scalarThe same number of layers is used for all encoder and decoder sections.
vectorThe kth element of NumConvolutionLayers is the number of convolution layers in the kth encoder section and corresponding decoder section. Typical values are in the range [1, 3].

Number of output channels for each section in the SegNet encoder network, specified as a positive integer or vector of positive integers. segnetLayers sets the number of output channels in the decoder to match the corresponding encoder section.

scalarThe same number of output channels is used for all encoder and decoder sections.
vectorThe kth element of NumOutputChannels is the number of output channels of the kth encoder section and corresponding decoder section.

Convolutional layer filter size, specified as a positive odd integer or a 2-element row vector of positive odd integers. Typical values are in the range [3, 7].

scalarThe filter is square.
2-element row vector

The filter has the size [height width].

Output Arguments

collapse all

Layers that represent the SegNet network architecture, returned as a layerGraph (Deep Learning Toolbox) object.


  • The sections within the SegNet encoder and decoder subnetworks are made up of convolutional, batch normalization, and ReLU layers.

  • All convolutional layers are configured such that the bias term is fixed to zero.

  • Convolution layer weights in the encoder and decoder subnetworks are initialized using the 'MSRA' weight initialization method. For 'vgg16' or 'vgg19' models, only the decoder subnetwork is initialized using MSRA.[1]

  • Networks produced by segnetLayers support GPU code generation for deep learning once they are trained with trainNetwork (Deep Learning Toolbox). See Deep Learning Code Generation (Deep Learning Toolbox) for details and examples.


[1] He, K., X. Zhang, S. Ren, and J. Sun. "Delving Deep Into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification." Proceedings of the IEEE International Conference on Computer Vision. 2015, 1026–1034.

[2] Badrinarayanan, V., A. Kendall, and R. Cipolla. "Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation." arXiv. Preprint arXiv: 1511.0051, 2015.

Extended Capabilities

Introduced in R2017b