Image Processing and Computer Vision

Semantic Segmentation

3 things you need to know

What Is Semantic Segmentation?

Semantic segmentation is a deep learning algorithm that associates a label or category with every pixel in an image. It is used to recognize a collection of pixels that form distinct categories. For example, an autonomous vehicle needs to identify vehicles, pedestrians, traffic signs, pavement, and other road features.

Semantic segmentation is used in many applications such as automated driving, medical imaging, and industrial inspection.

A simple example of semantic segmentation is separating the images into two classes. For example, in Figure 1, an image showing a person at the beach is paired with a version showing the image's pixels segmented into two separate classes: person and background.

Semantic Segmentation - Image and Labeled Pixels

Figure 1: Image and labeled pixels.

Semantic segmentation is not limited to two categories. You can change the number of categories for classifying the content of the image. This same image might be segmented into four classes: person, sky, water, and background for example.

How Does Semantic Segmentation Differ from Object Detection?

Semantic segmentation can be a useful alternative to object detection because it allows the object of interest to span multiple areas in the image at the pixel level. This technique cleanly detects objects that are irregularly shaped, in contrast to object detection, where objects must fit within a bounding box (Figure 2).

Semantic Segmentation - Object Detection

Figure 2: Object detection, showing bounding boxes to identify objects.

How Is Semantic Segmentation Used?

Because semantic segmentation labels pixels in an image, it is more precise than other forms of object detection. This makes semantic segmentation useful for applications in a variety of industries that require precise image maps, such as:

  • Autonomous driving—for identifying a drivable path for cars by separating the road from obstacles like pedestrians, sidewalks, poles, and other cars
  • Industrial inspection—for detecting defects in materials, such as wafer inspection
  • Satellite imagery—for identifying mountains, rivers, deserts, and other terrain
  • Medical imaging—for analyzing and detecting cancerous anomalies in cells
  • Robotic vision—for identifying and navigating objects and terrain
Semantic Segmentation - Multispectral Satellite Image

Figure 3: Semantic segmentation of a multispectral satellite image.

How Semantic Segmentation Works

The process of training a semantic segmentation network to classify images follows these steps:

  1. Analyze a collection of pixel-labeled images.
  2. Create a semantic segmentation network.
  3. Train the network to classify images into pixel categories.
  4. Assess the accuracy of the network.

Example: Automated Driving Application

The sequence in Figure 4 shows a real-world example of semantic segmentation used for automated driving. Images of the road are automatically segmented from the other vehicles. The next section shows how these networks are created.

Semantic Segmentation for an automated driving application

Figure 4: Semantic segmentation for an automated driving application.

Understanding the Architecture

One common approach to semantic segmentation is to create a SegNet, which is based on a convolutional neural network (CNN) architecture. A typical CNN architecture is shown in Figure 5.

This CNN classifies the entire image into one of many predefined categories.

Semantic Segmentation - typical structure of a CNN

Figure 5: Typical structure of a CNN.

To classify at the pixel level instead of the entire image, you can append a reverse implementation of a CNN. The upsampling process is performed the same number of times as the downsampling process to ensure the final image is the same size as the input image. Finally, a pixel classification output layer is used, which maps each pixel to a certain class. This forms an encoder-decoder architecture, which enables semantic segmentation.

Semantic Segmentation - CNN performing image-related functions

Figure 6: CNN performing image-related functions at each layer and then downsampling the image using a pooling layer (green). This process is repeated several times for the first half of the network. The output from the first half of this diagram is followed by an equal amount of unpooling layers (orange).

Using MATLAB for Semantic Segmentation


In MATLAB, the workflow for performing semantic segmentation follows these five steps:

  1. Label data or obtain labeled data.
  2. Create a datastore for original images and labeled images.
  3. Partition the datastores.
  4. Import a CNN and modify it to be a SegNet.
  5. Train and evaluate the network.

STEP 1: Label data or obtain labeled data.

Deep learning models are built on lots of data, and semantic segmentation is no exception. One option is to find labeled data on the Internet. If you have your own dataset, you can use the Image Labeler app in MATLAB. You can use this dataset to train a SegNet.

Semantic Segmentation - Image Labeler App

Figure 7: MATLAB Image Labeler app to label images for semantic segmentation. 

Learn More

STEP 2: Create a datastore for original images and labeled images.

When working with lots of data, it is often not possible to load all the information into memory. To manage large datasets, you can use a datastore. A datastore contains the location of the files you want to access, and it lets you read them into memory only when you need to operate on the files.

To create a SegNet, you need two datastores:

  1. ImageDatastore, which contains the original images
  2. PixelLabelDatastore, which contains the labeled images

STEP 3: Partition the datastores.

When creating a SegNet, you have to partition the datastore into two parts:

  1. The training set, used to train the SegNet
  2. The test set, used to evaluate the accuracy of a network
Semantic Segmentation - labeling highway scene

Figure 8: Highway scene showing color image (left) and corresponding labeled pixels (right).

STEP 4: Import a CNN and modify it to be a SegNet.

Loading a pretrained network, such as VGG16, and using the SegNetLayers command, creates the encoder-decoder architecture necessary for pixel-level labeling.

Semantic Segmentation - code to create the SegNet architecture

Figure 9: Creating the SegNet architecture with one line of code in MATLAB.

STEP 5: Train and evaluate the network.

In the final step, you set hyperparameters for the network and train the network.

How to Learn More About Semantic Segmentation

Products that support using semantic segmentation for image analysis include MATLAB, Computer Vision Toolbox for pixel labeling, and Deep Learning Toolbox for creating and training the network.

Training and prediction are supported on a CUDA® capable GPU with a compute capability of 3.0 or higher. Use of a GPU is recommended and requires Parallel Computing Toolbox.

Software Reference

30-Day Free Trial

Get started