SSD multibox object detection network
creates a single shot detector (SSD) multibox object detection network based on the
lgraph = ssdLayers(
baseNetwork, input image size, and the number of classes the network
should be configured to classify. The network is returned as an
LayerGraph (Deep Learning Toolbox)
The SSD is a convolutional neural network-based object detector that predicts bounding box coordinates, classification scores, and corresponding class labels.
returns an SSD that contains custom anchor boxes specified by
lgraph = ssdLayers(___,
anchorBoxes that are connected to the network layers at locations
predictorLayerNames. Specify these arguments in addition
to the input argument from the previous syntax.
Specify the base network.
baseNetwork = 'vgg16';
Specify the image size.
imageSize = [300 300 3];
Specify the classes to detect.
numClasses = 2;
Create the SSD object detection network.
lgraph = ssdLayers(imageSize,numClasses,baseNetwork);
Visualize the network using the network analyzer.
imageSize— Size of input image
Size of input image, specified as one of these values.
Two-element vector of the form [H W] for a grayscale image of size H-by-W
Three-element vector of the form [H W 3] for an RGB color image of size H-by-W
When you set the
baseNetwork input to
imageSize input must be of
the form [H
numClasses— Number of classes for network to classify
Number of classes for the network to classify, specified as a positive scalar.
baseNetwork— Pretrained convolutional neural network
Pretrained convolutional neural network, specified as a
LayerGraph (Deep Learning Toolbox),
DAGNetwork (Deep Learning Toolbox), or
SeriesNetwork (Deep Learning Toolbox)
object or as one of these network names. To specify one of these names, you must
download and install the network support packages for the corresponding valid network names.
The pretrained convolutional neural network is used as the base for the SSD multibox object detection network. For details on pretrained networks in MATLAB®, see Pretrained Deep Neural Networks (Deep Learning Toolbox).
anchorBoxes— Anchor boxes
Anchor boxes, specified as a 1-by-M cell array for M number of predictor layers in the SSD network. Each predictor layer contains a K-by-2 matrix that defines K anchor boxes of the form [height width]. The number of anchor boxes in each element can vary.
The size of each anchor box is determined based on the scale and aspect ratio of different object classes present in input training data. The size of each anchor box must be smaller than or equal to the size of the input image. You can use the clustering approach for estimating anchor boxes from the training data. For more information, see Estimate Anchor Boxes From Training Data.
predictorLayerNames— Names of layers in input
Names of layers in input, specified as an M-element vector of strings or a 1-by-M cell array of character vectors. The SSD detection subnetworks are attached to the predictor layers specified by this input.
lgraph— SSD multibox object detection network
ssdLayers function creates an SSD network and returns
lgraph, an object that represents the network architecture for an SSD
trainSSDObjectDetector function trains and returns an SSD object detector,
function for the
object to detect objects using the detector trained with the SSD network
bbox = detect(detector,I)
ssdLayers function uses a pretrained neural network as the base
network, to which it adds a detection subnetwork required for creating an SSD object detection
network. Given a base network,
ssdLayers removes all the layers
succeeding the feature layer in the base network and adds the detection subnetwork. The
detection subnetwork is comprised of groups of serially connected convolution, rectified
linear unit (ReLU), and batch normalization layers. The SSD merge layer, a box regression
layer, and a focal loss classification layer are added to the detection subnetwork.
 Liu, Wei, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. "SSD: Single Shot MultiBox Detector." In Computer Vision – ECCV 2016, edited by Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, 9905:21-37. Cham: Springer International Publishing, 2016. https://doi.org/10.1007/978-3-319-46448-0_2.
 Huang, Jonathan, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, et al. "Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors." In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3296-97. Honolulu, HI:IEEE, 2017. https//doi.org/10.1109/CVPR.2017.351.