Main Content


Datastore for semantic segmentation networks


Use pixelLabelImageDatastore to create a datastore for training a semantic segmentation network using deep learning.




pximds = pixelLabelImageDatastore(gTruth) returns a datastore for training a semantic segmentation network based on the input groundTruth object or array of groundTruth objects. Use the output pixelLabelImageDatastore object with the Deep Learning Toolbox™ function trainNetwork (Deep Learning Toolbox) to train convolutional neural networks for semantic segmentation.

pximds = pixelLabelImageDatastore(imds,pxds) returns a datastore based on the input image datastore and the pixel label datastore objects. imds is an ImageDatastore object that represents the training input to the network. pxds is a PixelLabelDatastore object that represents the required network output.

pximds = pixelLabelImageDatastore(___,Name,Value) additionally uses name-value pairs to set the DispatchInBackground and OutputSizeMode properties. For 2-D data, you can also use name-value pairs to specify the ColorPreprocessing, DataAugmentation, and OutputSize augmentation properties. You can specify multiple name-value pairs. Enclose each property name in quotes.

For example, pixelLabelImageDatastore(gTruth,'PatchesPerImage',40) creates a pixel label image datastore that randomly generates 40 patches from each ground truth object in gTruth.

Input Arguments

expand all

Ground truth data, specified as a groundTruth object or as an array of groundTruth objects. Each groundTruth object contains information about the data source, the list of label definitions, and all marked labels for a set of ground truth labels.

Collection of images, specified as an ImageDatastore object.

Collection of pixel labeled images, specified as a PixelLabelDatastore object. The object contains the pixel labeled images for each image contained in the imds input object.


expand all

This property is read-only.

Image file names used as the source for ground truth images, specified as a character vector or a cell array of character vectors.

This property is read-only.

Pixel label data file names used as the source for ground truth label images, specified as a character or a cell array of characters.

This property is read-only.

Class names, specified as a cell array of character vectors.

Color channel preprocessing for 2-D data, specified as 'none', 'gray2rgb', or 'rgb2gray'. Use this property when you need the image data created by the data source must be only color or grayscale, but the training set includes both. Suppose you need to train a network that expects color images but some of your training images are grayscale. Set ColorPreprocessing to 'gray2rgb' to replicate the color channels of the grayscale images in the input image set. Using the 'gray2rgb' option creates M-by-N-by-3 output images.

The ColorPreprocessing property is not supported for 3-D data. To perform color channel preprocessing of 3-D data, use the transform function.

Preprocessing applied to input images, specified as an imageDataAugmenter (Deep Learning Toolbox) object or 'none'. When DataAugmentation is 'none', no preprocessing is applied to input images. Training data can be augmented in real-time during training.

The DataAugmentation property is not supported for 3-D data. To preprocess 3-D data, use the transform function.

Dispatch observations in the background during training, prediction, and classification, specified as false or true. To use background dispatching, you must have Parallel Computing Toolbox™. If DispatchInBackground is true and you have Parallel Computing Toolbox, then pixelLabelImageDatastore asynchronously reads patches, adds noise, and queues patch pairs.

Number of observations that are returned in each batch. The default value is equal to the ReadSize of image datastore imds. You can change the value of MiniBatchSize only after you create the datastore. For training, prediction, or classification, the MiniBatchSize property is set to the mini-batch size defined in trainingOptions (Deep Learning Toolbox).

This property is read-only.

Total number of observations in the denoising image datastore. The number of observations is the length of one training epoch.

This property is read-only.

Size of output images, specified as a vector of two positive integers. The first element specifies the number of rows in the output images, and the second element specifies the number of columns. When you specify OutputSize, image sizes are adjusted as necessary. By default, this property is empty, which means that the images are not adjusted.

The OutputSize property is not supported for 3-D data. To set the output size of 3-D data, use the transform function.

Method used to resize output images, specified as one of the following. This property applies only when you set OutputSize to a value other than [].

  • 'resize' — Scale the image to fit the output size. For more information, see imresize.

  • 'centercrop' — Take a crop from the center of the training image. The crop has the same size as the output size.

  • 'randcrop' — Take a random crop from the training image. The random crop has the same size as the output size.

Data Types: char | string

Object Functions

combineCombine data from multiple datastores
countEachLabelCount occurrence of pixel or box labels
hasdataDetermine if data is available to read
partitionByIndexPartition pixelLabelImageDatastore according to indices
previewPreview subset of data in datastore
readRead data from a datastore
readallRead all data in datastore
readByIndexRead data specified by index from pixelLabelImageDatastore
resetReset datastore to initial state
shuffleReturn shuffled version of datastore
transformTransform datastore


collapse all

Load the training data.

dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageDir = fullfile(dataSetDir,'trainingImages');
labelDir = fullfile(dataSetDir,'trainingLabels');

Create an image datastore for the images.

imds = imageDatastore(imageDir);

Create a pixelLabelDatastore for the ground truth pixel labels.

classNames = ["triangle","background"];
labelIDs   = [255 0];
pxds = pixelLabelDatastore(labelDir,classNames,labelIDs);

Visualize training images and ground truth pixel labels.

I = read(imds);
C = read(pxds);

I = imresize(I,5);
L = imresize(uint8(C{1}),5);

Create a semantic segmentation network. This network uses a simple semantic segmentation network based on a downsampling and upsampling design.

numFilters = 64;
filterSize = 3;
numClasses = 2;
layers = [
    imageInputLayer([32 32 1])

Setup training options.

opts = trainingOptions('sgdm', ...
    'InitialLearnRate',1e-3, ...
    'MaxEpochs',100, ...

Combine the image and pixel label datastore for training.

trainingData = combine(imds,pxds);

Train the network.

net = trainNetwork(trainingData,layers,opts);
Training on single CPU.
Initializing input data normalization.
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |     Loss     |      Rate       |
|       1 |           1 |       00:00:00 |       58.11% |       1.3458 |          0.0010 |
|      17 |          50 |       00:00:12 |       97.30% |       0.0924 |          0.0010 |
|      34 |         100 |       00:00:24 |       98.09% |       0.0575 |          0.0010 |
|      50 |         150 |       00:00:37 |       98.56% |       0.0424 |          0.0010 |
|      67 |         200 |       00:00:49 |       98.48% |       0.0435 |          0.0010 |
|      84 |         250 |       00:01:02 |       98.66% |       0.0363 |          0.0010 |
|     100 |         300 |       00:01:14 |       98.90% |       0.0310 |          0.0010 |
Training finished: Reached final iteration.

Read and display a test image.

testImage = imread('triangleTest.jpg');

Segment the test image and display the results.

C = semanticseg(testImage,net);
B = labeloverlay(testImage,C);

Configure a pixel label image datastore to augment data while training.

Load training images and pixel labels.

dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageDir = fullfile(dataSetDir,'trainingImages');
labelDir = fullfile(dataSetDir,'trainingLabels');

Create an imageDatastore object to hold the training images.

imds = imageDatastore(imageDir);

Define the class names and their associated label IDs.

classNames = ["triangle","background"];
labelIDs   = [255 0];

Create a pixelLabelDatastore object to hold the ground truth pixel labels for the training images.

pxds = pixelLabelDatastore(labelDir, classNames, labelIDs);

Create an imageDataAugmenter object to randomly rotate and mirror image data.

augmenter = imageDataAugmenter('RandRotation',[-10 10],'RandXReflection',true)
augmenter = 
  imageDataAugmenter with properties:

           FillValue: 0
     RandXReflection: 1
     RandYReflection: 0
        RandRotation: [-10 10]
           RandScale: [1 1]
          RandXScale: [1 1]
          RandYScale: [1 1]
          RandXShear: [0 0]
          RandYShear: [0 0]
    RandXTranslation: [0 0]
    RandYTranslation: [0 0]

Create a pixelLabelImageDatastore object to train the network with augmented data.

plimds = pixelLabelImageDatastore(imds,pxds,'DataAugmentation',augmenter)
plimds = 
  pixelLabelImageDatastore with properties:

                  Images: {200x1 cell}
          PixelLabelData: {200x1 cell}
              ClassNames: {2x1 cell}
        DataAugmentation: [1x1 imageDataAugmenter]
      ColorPreprocessing: 'none'
              OutputSize: []
          OutputSizeMode: 'resize'
           MiniBatchSize: 1
         NumObservations: 200
    DispatchInBackground: 0

Train a semantic segmentation network using dilated convolutions.

A semantic segmentation network classifies every pixel in an image, resulting in an image that is segmented by class. Applications for semantic segmentation include road segmentation for autonomous driving and cancer cell segmentation for medical diagnosis. To learn more, see Getting Started with Semantic Segmentation Using Deep Learning.

Semantic segmentation networks like DeepLab [1] make extensive use of dilated convolutions (also known as atrous convolutions) because they can increase the receptive field of the layer (the area of the input which the layers can see) without increasing the number of parameters or computations.

Load Training Data

The example uses a simple dataset of 32-by-32 triangle images for illustration purposes. The dataset includes accompanying pixel label ground truth data. Load the training data using an imageDatastore and a pixelLabelDatastore.

dataFolder = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageFolderTrain = fullfile(dataFolder,'trainingImages');
labelFolderTrain = fullfile(dataFolder,'trainingLabels');

Create an imageDatastore for the images.

imdsTrain = imageDatastore(imageFolderTrain);

Create a pixelLabelDatastore for the ground truth pixel labels.

classNames = ["triangle" "background"];
labels = [255 0];
pxdsTrain = pixelLabelDatastore(labelFolderTrain,classNames,labels)
pxdsTrain = 
  PixelLabelDatastore with properties:

                       Files: {200x1 cell}
                  ClassNames: {2x1 cell}
                    ReadSize: 1
                     ReadFcn: @readDatastoreImage
    AlternateFileSystemRoots: {}

Create Semantic Segmentation Network

This example uses a simple semantic segmentation network based on dilated convolutions.

Create a data source for training data and get the pixel counts for each label.

ds = combine(imdsTrain,pxdsTrain);
tbl = countEachLabel(pxdsTrain)
tbl=2×3 table
         Name         PixelCount    ImagePixelCount
    ______________    __________    _______________

    {'triangle'  }         10326       2.048e+05   
    {'background'}    1.9447e+05       2.048e+05   

The majority of pixel labels are for background. This class imbalance biases the learning process in favor of the dominant class. To fix this, use class weighting to balance the classes. You can use several methods to compute class weights. One common method is inverse frequency weighting where the class weights are the inverse of the class frequencies. This method increases the weight given to under represented classes. Calculate the class weights using inverse frequency weighting.

numberPixels = sum(tbl.PixelCount);
frequency = tbl.PixelCount / numberPixels;
classWeights = 1 ./ frequency;

Create a network for pixel classification by using an image input layer with an input size corresponding to the size of the input images. Next, specify three blocks of convolution, batch normalization, and ReLU layers. For each convolutional layer, specify 32 3-by-3 filters with increasing dilation factors and pad the inputs so they are the same size as the outputs by setting the 'Padding' option to 'same'. To classify the pixels, include a convolutional layer with K 1-by-1 convolutions, where K is the number of classes, followed by a softmax layer and a pixelClassificationLayer with the inverse class weights.

inputSize = [32 32 1];
filterSize = 3;
numFilters = 32;
numClasses = numel(classNames);

layers = [

Train Network

Specify the training options.

options = trainingOptions('sgdm', ...
    'MaxEpochs', 100, ...
    'MiniBatchSize', 64, ... 
    'InitialLearnRate', 1e-3);

Train the network using trainNetwork.

net = trainNetwork(ds,layers,options);
Training on single CPU.
Initializing input data normalization.
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |     Loss     |      Rate       |
|       1 |           1 |       00:00:02 |       91.62% |       1.6825 |          0.0010 |
|      17 |          50 |       00:00:24 |       88.56% |       0.2393 |          0.0010 |
|      34 |         100 |       00:00:47 |       92.08% |       0.1672 |          0.0010 |
|      50 |         150 |       00:01:07 |       93.17% |       0.1472 |          0.0010 |
|      67 |         200 |       00:01:28 |       94.15% |       0.1313 |          0.0010 |
|      84 |         250 |       00:01:51 |       94.47% |       0.1167 |          0.0010 |
|     100 |         300 |       00:02:13 |       95.04% |       0.1100 |          0.0010 |
Training finished: Max epochs completed.

Test Network

Load the test data. Create an imageDatastore for the images. Create a pixelLabelDatastore for the ground truth pixel labels.

imageFolderTest = fullfile(dataFolder,'testImages');
imdsTest = imageDatastore(imageFolderTest);
labelFolderTest = fullfile(dataFolder,'testLabels');
pxdsTest = pixelLabelDatastore(labelFolderTest,classNames,labels);

Make predictions using the test data and trained network.

pxdsPred = semanticseg(imdsTest,net,'MiniBatchSize',32,'WriteLocation',tempdir);
Running semantic segmentation network
* Processed 100 images.

Evaluate the prediction accuracy using evaluateSemanticSegmentation.

metrics = evaluateSemanticSegmentation(pxdsPred,pxdsTest);
Evaluating semantic segmentation results
* Selected metrics: global accuracy, class accuracy, IoU, weighted IoU, BF score.
* Processed 100 images.
* Finalizing... Done.
* Data set metrics:

    GlobalAccuracy    MeanAccuracy    MeanIoU    WeightedIoU    MeanBFScore
    ______________    ____________    _______    ___________    ___________

       0.95237          0.97352       0.72081      0.92889        0.46416  

For more information on evaluating semantic segmentation networks, see evaluateSemanticSegmentation.

Segment New Image

Read and display the test image triangleTest.jpg.

imgTest = imread('triangleTest.jpg');

Figure contains an axes object. The axes object contains an object of type image.

Segment the test image using semanticseg and display the results using labeloverlay.

C = semanticseg(imgTest,net);
B = labeloverlay(imgTest,C);

Figure contains an axes object. The axes object contains an object of type image.


  • The pixelLabelDatastore pxds and the imageDatastore imds store files that are located in a folder in lexicographical order. For example, if you have twelve files named 'file1.jpg', 'file2.jpg', … , 'file11.jpg', and 'file12.jpg', then the files are stored in this order:

    Files that are stored in a cell array are read in the same order as they are stored.

    If the order of files in pxds and imds are not the same, then you may encounter a mismatch when you read a ground truth image and corresponding label data using a pixelLabelImageDatastore. If this occurs, then rename the pixel label files so that they have the correct order. For example, rename 'file1.jpg', … , 'file9.jpg' to 'file01.jpg', …, 'file09.jpg'.

  • To extract semantic segmentation data from a groundTruth object generated by the Video Labeler, use the pixelLabelTrainingData function.

Introduced in R2018a