Semantic segmentation is a deep learning algorithm that associates a label or category with every pixel in an image. It is used to recognize a collection of pixels that form distinct categories. For example, an autonomous vehicle needs to identify vehicles, pedestrians, traffic signs, pavement, and other road features.
Semantic segmentation is used in many applications such as automated driving, medical imaging, and industrial inspection.
A simple example of semantic segmentation is separating the images into two classes. For example, in Figure 1, an image showing a person at the beach is paired with a version showing the image's pixels segmented into two separate classes: person and background.
Semantic segmentation is not limited to two categories. You can change the number of categories for classifying the content of the image. This same image might be segmented into four classes: person, sky, water, and background for example.
How Does Semantic Segmentation Differ from Object Detection?
Semantic segmentation can be a useful alternative to object detection because it allows the object of interest to span multiple areas in the image at the pixel level. This technique cleanly detects objects that are irregularly shaped, in contrast to object detection, where objects must fit within a bounding box (Figure 2).
How Is Semantic Segmentation Used?
Because semantic segmentation labels pixels in an image, it is more precise than other forms of object detection. This makes semantic segmentation useful for applications in a variety of industries that require precise image maps, such as:
- Autonomous driving—for identifying a drivable path for cars by separating the road from obstacles like pedestrians, sidewalks, poles, and other cars
- Industrial inspection—for detecting defects in materials, such as wafer inspection
- Satellite imagery—for identifying mountains, rivers, deserts, and other terrain
- Medical imaging—for analyzing and detecting cancerous anomalies in cells
- Robotic vision—for identifying and navigating objects and terrain
The process of training a semantic segmentation network to classify images follows these steps:
- Analyze a collection of pixel-labeled images.
- Create a semantic segmentation network.
- Train the network to classify images into pixel categories.
- Assess the accuracy of the network.
Example: Automated Driving Application
The sequence in Figure 4 shows a real-world example of semantic segmentation used for automated driving. Images of the road are automatically segmented from the other vehicles. The next section shows how these networks are created.
Understanding the Architecture
One common approach to semantic segmentation is to create a SegNet, which is based on a convolutional neural network (CNN) architecture. A typical CNN architecture is shown in Figure 5.
This CNN classifies the entire image into one of many predefined categories.
To classify at the pixel level instead of the entire image, you can append a reverse implementation of a CNN. The upsampling process is performed the same number of times as the downsampling process to ensure the final image is the same size as the input image. Finally, a pixel classification output layer is used, which maps each pixel to a certain class. This forms an encoder-decoder architecture, which enables semantic segmentation.
In MATLAB, the workflow for performing semantic segmentation follows these five steps:
- Label data or obtain labeled data.
- Create a datastore for original images and labeled images.
- Partition the datastores.
- Import a CNN and modify it to be a SegNet.
- Train and evaluate the network.
STEP 1: Label data or obtain labeled data.
Deep learning models are built on lots of data, and semantic segmentation is no exception. One option is to find labeled data on the Internet. If you have your own dataset, you can use the Image Labeler app in MATLAB. You can use this dataset to train a SegNet.
STEP 2: Create a datastore for original images and labeled images.
When working with lots of data, it is often not possible to load all the information into memory. To manage large datasets, you can use a datastore. A datastore contains the location of the files you want to access, and it lets you read them into memory only when you need to operate on the files.
To create a SegNet, you need two datastores:
- ImageDatastore, which contains the original images
- PixelLabelDatastore, which contains the labeled images
STEP 3: Partition the datastores.
When creating a SegNet, you have to partition the datastore into two parts:
- The training set, used to train the SegNet
- The test set, used to evaluate the accuracy of a network
STEP 4: Import a CNN and modify it to be a SegNet.
Loading a pretrained network, such as VGG16, and using the SegNetLayers command, creates the encoder-decoder architecture necessary for pixel-level labeling.
STEP 5: Train and evaluate the network.
In the final step, you set hyperparameters for the network and train the network.
How to Learn More About Semantic Segmentation
Training and prediction are supported on a CUDA® capable GPU with a compute capability of 3.0 or higher. Use of a GPU is recommended and requires Parallel Computing Toolbox.
- Semantic Segmentation Overview (7:56)
- Demystifying Deep Learning: Semantic Segmentation and Deployment (47:09) – Webinar