Main Content

bisenetv2

Create BiSeNet v2 convolutional neural network for semantic segmentation

Since R2025a

Description

Use the bisenetv2 function to semantically segment images using the BiSeNet v2 convolutional neural network. Using the pretrained network, trained on 171 image classes of the COCO-Stuff data set [2], you can perform inference on test images which contain these classes.

To perform semantic segmentation on a custom data set, you must train the network on your data set using the trainnet (Deep Learning Toolbox) function.

Note

This functionality requires Deep Learning Toolbox™ and the Computer Vision Toolbox™ Model for BiSeNet v2 Semantic Segmentation Network. You can install the Computer Vision Toolbox Model for BiSeNet v2 Semantic Segmentation Network from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons.

[net,classes] = bisenetv2 returns a pretrained BiSeNet v2 network and the names of the 171 classes that it was trained on.

example

net = bisenetv2(imageSize,numClasses) returns a BiSeNet v2 network pretrained on 171 image classes of the COCO-Stuff data set and configured for transfer learning using input images of the specified size imageSize and the specified number of classes numClasses. Train this network on a custom data set using the using the trainnet (Deep Learning Toolbox) function.

[net] = bisenetv2(imageSize,numClasses,Name=Value) specifies options using one or more name-value arguments in addition to the input arguments from the previous syntax. When you specify one or more network options using a name-value argument, you create a custom BiSeNet v2 network with uninitialized weights.

For example, bisenetv2(ChannelRatio=16) specifies the channel ratio between the semantic branch and detail branch as 16.

Examples

collapse all

Create a BiSeNet v2 convolutional neural network for semantic segmentation.

[net,classes] = bisenetv2;

Read a test image into the workspace.

I = imread("kobi.png");

Resize the image to the input size of the network.

inputSize = net.Layers(1).InputSize(1:2);
img = imresize(I,inputSize);

Create a segmentation map, a categorical array which relates the labels to each pixel in the input image, of the test image using the semanticseg function.

segMap = semanticseg(img,net,Classes=classes);

Display the segmentation map overlaid on the image, ordered by the smallest object masks on top, using the labeloverlay function.

segmentedImage = labeloverlay(img,segMap,Transparency=0.4);
imshow(segmentedImage)

Figure contains an axes object. The hidden axes object contains an object of type image.

Input Arguments

collapse all

Network input image size, specified as one of these options:

  • Two-element vector of the form [height width].

  • Three-element vector of the form [height width depth]. depth is the number of image channels. Specify depth as 3 for RGB images, as 1 for grayscale images, or as the number of channels for multispectral and hyperspectral images.

Number of classes to segment, specified as an integer greater than 1.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: bisenetv2(ChannelRatio=16) specifies the channel ratio between the semantic and detail branch as 16.

Channel ratio between the semantic branch and detail branch, specified as a positive integer in the range [2, 64]. The channel ratio must be a power of 2 no greater than 64.

Tip

Decrease this value to enable the detail branch to learn finer features. Increase the channel ratio to enable the semantic branch to capture global context and improve the overall performance of the network, at the expense of less segmented finer features and a slower computing speed. Use empirical analysis to determine the optimal channel ratio for your application.

Channel multiplier, specified as a positive scalar. This value scales the number of channels in all convolutional layers other than the inputs to the semantic, detail, and output branches.

Tip

Increase this value to improve segmentation accuracy at the cost of larger model size and longer training time.

Depending on the ChannelRatio value you specify, the ChannelMultiplier argument supports these values:

ChannelRatio ValueSupported ChannelMultiplier Values
2, 4, 81, 1.25, 1.5, 1.75, 2
161, 1.5, 2
321, 2
642

Depth multiplier specified as a positive scalar. This value scales the number of layers of the semantic and detail branches. Increase the depth multiplier to improve learning capacity and accuracy at the cost of longer processing time. For most applications, use specify a depth multiplier value between 1 and 4.

Output Arguments

collapse all

BiSeNet v2 network layers, returned as a dlnetwork (Deep Learning Toolbox) object.

Names of the classes that the BiSeNet v2 network has been pretrained to segment, returned as a categorical array.

More About

collapse all

References

[1] Yu, Changqian, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, and Nong Sang. “BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation.” International Journal of Computer Vision 129, no. 11 (November 2021): 3051–68. https://doi.org/10.1007/s11263-021-01515-2.

[2] Caesar, Holger, Jasper Uijlings, and Vittorio Ferrari. “COCO-Stuff: Thing and Stuff Classes in Context.” In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1209–18. Salt Lake City, UT, USA: IEEE, 2018. https://doi.org/10.1109/CVPR.2018.00132.

Version History

Introduced in R2025a