Main Content

Code Generation for Semantic Segmentation Application on Intel CPUs That Uses U-Net

This example demonstrates code generation for an image segmentation application that uses deep learning. It uses the codegen command to generate a MEX function that performs prediction by using the deep learning network U-Net for image segmentation.

For a similar example that demonstrates segmentation of images by using U-Net but does not use the codegen command, see Semantic Segmentation of Multispectral Images Using Deep Learning (Image Processing Toolbox).

Third-Party Prerequisites

  • Xeon processor with support for Intel Advanced Vector Extensions 2 (Intel AVX2) instructions

This example is supported on Linux®, Windows®, and macOS platforms.

This example uses the Intel MKL-DNN library that ships with MATLAB and generates a MEX function for semantic segmentation.

This example is not supported in MATLAB Online.

Overview of U-Net

U-Net [1] is a type of convolutional neural network (CNN)that is designed for semantic image segmentation. In U-Net, the initial series of convolutional layers are interspersed with max pooling layers, successively decreasing the resolution of the input image. These layers are followed by a series of convolutional layers interspersed with upsampling operators, successively increasing the resolution of the input image. The combination of these two series paths forms a U-shaped graph. The network was originally trained to perform prediction for biomedical image segmentation applications. This example demonstrates the ability of the network to track changes in forest cover over time. Environmental agencies track deforestation to assess and qualify the environmental and ecological health of a region.

Deep-learning-based semantic segmentation can yield a precise measurement of vegetation cover from high-resolution aerial photographs. One of the challenges is differentiating classes that have similar visual characteristics, such as trying to classify a green pixel as grass, shrubbery, or tree. To increase classification accuracy, some data sets contain multispectral images that provide additional information about each pixel. For example, the Hamlin Beach State Park data set supplements the color images with near-infrared channels that provide a clearer separation of the classes.

This example uses the Hamlin Beach State Park Data [2] along with a pretrained U-Net network in order to correctly classify each pixel.

The U-Net this example uses is trained to segment pixels belonging to 18 classes which includes:

0. Other Class/Image Border      7. Picnic Table         14. Grass
1. Road Markings                 8. Black Wood Panel     15. Sand
2. Tree                          9. White Wood Panel     16. Water (Lake)
3. Building                     10. Orange Landing Pad   17. Water (Pond)
4. Vehicle (Car, Truck, or Bus) 11. Water Buoy           18. Asphalt (Parking Lot/Walkway)
5. Person                       12. Rocks
6. Lifeguard Chair              13. Other Vegetation

Get Pretrained U-Net DAG Network Object

trainedUnet_url = '';
Downloading Pre-trained U-net for Hamlin Beach dataset...
This will take several minutes to download...

trainedUnetFile = "trainedUnet/multispectralUnet.mat";
ld = load("trainedUnet/multispectralUnet.mat");
net =;

The DAG network contains 58 layers including convolution, max pooling, depth concatenation, and pixel classification output layers. To display an interactive visualization of the deep learning network architecture, use the analyzeNetwork (Deep Learning Toolbox) function.

%   analyzeNetwork(net);

The segmentImageUnet Entry-Point Function

The segmentImageUnet entry-point function performs semantic segmentation on the input image for each patch of a fixed size by using the multispectralUnet network contained in the multispectralUnet.mat file. This function loads the network object from the multispectralUnet.mat file into a persistent variable mynet. The function reuses this persistent variable in subsequent prediction calls.

function out = segmentImageUnet(im,patchSize,trainedNet)  
%  OUT = segmentImageUnet(IM,patchSize,trainedNet) returns a semantically
%  segmented image, segmented using the multi-spectral Unet specified in
%  trainedNet. The segmentation is performed over each patch of size
%  patchSize.
% Copyright 2019-2022 The MathWorks, Inc.

persistent mynet;

if isempty(mynet)
    mynet = coder.loadDeepLearningNetwork(trainedNet);

[height, width, nChannel] = size(im);
patch = coder.nullcopy(zeros([patchSize, nChannel-1]));

% Pad image to have dimensions as multiples of patchSize
padSize = zeros(1,2);
padSize(1) = patchSize(1) - mod(height, patchSize(1));
padSize(2) = patchSize(2) - mod(width, patchSize(2));

im_pad = padarray (im, padSize, 0, 'post');
[height_pad, width_pad, ~] = size(im_pad);

out = zeros([size(im_pad,1), size(im_pad,2)], 'uint8');

for i = 1:patchSize(1):height_pad    
    for j =1:patchSize(2):width_pad        
        for p = 1:nChannel-1              
            patch(:,:,p) = squeeze( im_pad( i:i+patchSize(1)-1,...
        % Pass in input
        segmentedLabels = activations(mynet, patch, 'Segmentation-Layer');
        % Takes the max of each channel (6 total at this point)
        [~,L] = max(segmentedLabels,[],3);
        patch_seg = uint8(L);
        % Populate section of output
        out(i:i+patchSize(1)-1, j:j+patchSize(2)-1) = patch_seg;

% Remove the padding
out = out(1:height, 1:width);

Prepare Data

Download the Hamlin Beach State Park data.

if ~exist(fullfile(pwd,'data'),'dir')
    url = '';
Downloading Hamlin Beach dataset...
This will take several minutes to download...

Load and examine the data in MATLAB.


% Examine data
whos test_data
  Name           Size                         Bytes  Class     Attributes

  test_data      7x12446x7654            1333663576  uint16              

The image has seven channels. The RGB color channels are the fourth, fifth, and sixth image channels. The first three channels correspond to the near-infrared bands and highlight different components of the image based on their heat signatures. Channel 7 is a mask that indicates the valid segmentation region.

The multispectral image data is arranged as numChannels-by-width-by-height arrays. In MATLAB, multichannel images are arranged as width-by-height-by-numChannels arrays. To reshape the data so that the channels are in the third dimension, use the helper function, switchChannelsToThirdPlane.

test_data  = switchChannelsToThirdPlane(test_data);

Confirm data has the correct structure (channels last).

whos test_data
  Name               Size                     Bytes  Class     Attributes

  test_data      12446x7654x7            1333663576  uint16              

This example uses a cropped version of the full Hamlin Beach State Park dataset that the test_data variable contains. Crop the height and width of test_data to create the variable input_data that this example uses.

test_datacropRGB = imcrop(test_data(:,:,1:3),[2600, 3000, 2000, 2000]);
test_datacropInfrared = imcrop(test_data(:,:,4:6),[2600, 3000, 2000, 2000]);
test_datacropMask = imcrop(test_data(:,:,7),[2600, 3000, 2000, 2000]);
input_data(:,:,1:3) = test_datacropRGB;
input_data(:,:,4:6) = test_datacropInfrared;
input_data(:,:,7) = test_datacropMask;

Examine the input_data variable.

  Name               Size                   Bytes  Class     Attributes

  input_data      2001x2001x7            56056014  uint16              

Generate MEX

To generate a MEX function for the segmentImageUnet.m entry-point function, create a code configuration object cfg for MEX code generation. Set the target language to C++. Use the coder.DeepLearningConfig (GPU Coder) function to create an MKL-DNN deep learning configuration object and assign it to the DeepLearningConfig property of cfg. Run the codegen command specifying an input size of [12446,7654,7] and a patch size of [1024,1024]. These values correspond to the size of the entire input_data variable. The smaller patch sizes speed up inference. To see how the patches are calculated, see the segmentImageUnet entry-point function.

cfg = coder.config('mex');
cfg.ConstantInputs = 'Remove';
cfg.TargetLang = 'C++';
cfg.DeepLearningConfig = coder.DeepLearningConfig('mkldnn');
codegen -config cfg segmentImageUnet -args {ones(size(input_data),'uint16'),coder.Constant([1024 1024]),coder.Constant(trainedUnetFile)} -report
Code generation successful: To view the report, open('codegen\mex\segmentImageUnet\html\report.mldatx')

Run Generated MEX to Predict Results for input_data

The segmentImageUnet function accepts input_data and a vector containing the dimensions of the patch size as inputs. The function divides the image into patches, predicts the pixels in a particular patch, and finally combines all the patches. Because of the large size of input_data (12446x7654x7), it is easier to process the image in patches.

segmentedImage = segmentImageUnet_mex(input_data);

To extract only the valid portion of the segmentation, multiply the segmented image by the mask channel of the test data.

segmentedImage = uint8(input_data(:,:,7)~=0) .* segmentedImage;

Remove the noise and stray pixels by using the medfilt2 function.

segmentedImage = medfilt2(segmentedImage,[5,5]);

Display U-Net Segmented input_data

This line of code creates a vector of the class names:

classNames = net.Layers(end).Classes;

Overlay the labels on the segmented RGB test image and add a color bar to the segmentation image.

% Display input data

title('Input Image');
cmap = jet(numel(classNames));
segmentedImageOut = labeloverlay(imadjust(input_data(:,:,4:6),[0 0.6],[0.1 0.9],0.55),segmentedImage,'Transparency',0,'Colormap',cmap);

% Display segmented data

title('Segmented Image Output');
N = numel(classNames);
ticks = 1/(N*2):1/N:1;
title('Segmented Image using Mkldnn');
segmentedImageOverlay = labeloverlay(imadjust(input_data(:,:,4:6),[0 0.6],[0.1 0.9],0.55),segmentedImage,'Transparency',0.7,'Colormap',cmap);
title('Segmented Overlay Image');


[1] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-Net: Convolutional Networks for Biomedical Image Segmentation." arXiv preprint arXiv:1505.04597, 2015.

[2] Kemker, R., C. Salvaggio, and C. Kanan. "High-Resolution Multispectral Dataset for Semantic Segmentation." CoRR, abs/1703.01918, 2017.

[3] Kemker, Ronald, Carl Salvaggio, and Christopher Kanan. "Algorithms for Semantic Segmentation of Multispectral Remote Sensing Imagery Using Deep Learning." ISPRS Journal of Photogrammetry and Remote Sensing, Deep Learning RS Data, 145 (November 1, 2018): 60-77.

See Also

| | (Deep Learning Toolbox)

Related Topics