Detect objects using YOLO v2 object detector
detects objects within a single image or an array of images,
bboxes = detect(
you look only once version 2 (YOLO v2) object detector. The input size of the image must
be greater than or equal to the network input size of the pretrained detector. The
locations of objects detected are returned as a set of bounding boxes.
When using this function, use of a CUDA®-enabled NVIDIA® GPU with a compute capability of 3.0 or higher is highly recommended. The GPU reduces computation time significantly. Usage of the GPU requires Parallel Computing Toolbox™.
[___] = detect(___,
detects objects within the rectangular search region specified by
roi. Use output arguments from any of the previous syntaxes. Specify
input arguments from any of the previous syntaxes.
[___] = detect(___,
specifies options using one or more
Name,Value pair arguments in
addition to the input arguments in any of the preceding syntaxes.
Load a YOLO v2 object detector pretrained to detect vehicles.
vehicleDetector = load('yolov2VehicleDetector.mat','detector'); detector = vehicleDetector.detector;
Read a test image into the workspace.
I = imread('highway.png');
Display the input test image.
Run the pretrained YOLO v2 object detector on the test image. Inspect the results for vehicle detection. The labels are derived from the
ClassNames property of the detector.
[bboxes,scores,labels] = detect(detector,I)
bboxes = 1×4 78 81 64 63
scores = single 0.6224
labels = categorical vehicle
Annotate the image with the bounding boxes for the detections.
if ~isempty(bboxes) detectedI = insertObjectAnnotation(I,'rectangle',bboxes,cellstr(labels)); end figure imshow(detectedI)
I— Input image
Input image, specified as an H-by-W-by-C-by-B numeric array of images Images must be real, nonsparse, grayscale or RGB image.
C: The channel size in each image must be equal to
the network's input channel size. For example, for grayscale images,
C must be equal to
1. For RGB
color images, it must be equal to
B: The number of images in the array.
The detector is sensitive to the range of the input image. Therefore, ensure that the input
image range is similar to the range of the images used to train the detector. For
example, if the detector was trained on
uint8 images, rescale
this input image to the range [0, 255] by using the
rescale function. The size of this input image should be comparable
to the sizes of the images used in training. If these sizes are very different, the
detector has difficulty detecting objects because the scale of the objects in the
input image differs from the scale of the objects the detector was trained to
identify. Consider whether you used the
property during training to modify the size of training images.
Datastore, specified as a datastore object containing a collection of images. Each image must be a grayscale, RGB, or multichannel image. The function processes only the first column of the datastore, which must contain images and must be cell arrays or tables with multiple columns.
roi— Search region of interest
Search region of interest, specified as an [x y width height] vector. The vector specifies the upper left corner and size of a region in pixels.
comma-separated pairs of
the argument name and
Value is the corresponding value.
Name must appear inside quotes. You can specify several name and value
pair arguments in any order as
'Threshold'— Detection threshold
0.5(default) | scalar in the range [0, 1]
Detection threshold, specified as a comma-separated pair consisting of
'Threshold' and a scalar in the range [0, 1]. Detections that
have scores less than this threshold value are removed. To reduce false positives,
increase this value.
'SelectStrongest'— Select strongest bounding box
Select the strongest bounding box for each detected object, specified as the
comma-separated pair consisting of
'SelectStrongest' and either
true — Returns the strongest bounding box per object. The
method calls the
selectStrongestBboxMulticlass function, which uses nonmaximal
suppression to eliminate overlapping bounding boxes based on their confidence
By default, the
selectStrongestBboxMulticlass function is called as
selectStrongestBboxMulticlass(bbox,scores,... 'RatioType','Union',... 'OverlapThreshold',0.5);
false — Return all the detected bounding boxes. You can
then write your own custom method to eliminate overlapping bounding boxes.
'MinSize'— Minimum region size
[1 1](default) | vector of the form [height width]
Minimum region size, specified as the comma-separated pair consisting of
'MinSize' and a vector of the form [height
width]. Units are in pixels. The minimum region size defines the
size of the smallest region containing the object.
MinSize is 1-by-1.
'MaxSize'— Maximum region size
I) (default) | vector of the form [height width]
Maximum region size, specified as the comma-separated pair consisting of
'MaxSize' and a vector of the form [height
width]. Units are in pixels. The maximum region size defines the
size of the largest region containing the object.
'MaxSize' is set to the height and width of the
I. To reduce computation time, set this value to the
known maximum region size for the objects that can be detected in the input test
'MiniBatchSize'— Minimum batch size
128(default) | scalar
Minimum batch size, specified as the comma-separated pair consisting of
'MiniBatchSize' and a scalar value. Use the
MiniBatchSize to process a large collection of image. Images
are grouped into minibatches and processed as a batch to improve computation
efficiency. Increase the minibatch size to decrease processing time. Decrease the size
to use less memory.
'ExecutionEnvironment'— Hardware resource
Hardware resource on which to run the detector, specified as the comma-separated
pair consisting of
'auto' — Use a GPU if it is available. Otherwise, use the
'gpu' — Use the GPU. To use a GPU, you must have
Computing Toolbox and a CUDA-enabled NVIDIA GPU with a compute capability of 3.0 or higher. If a suitable GPU
is not available, the function returns an error.
'cpu' — Use the CPU.
'Acceleration'— Performance optimization
Performance optimization, specified as the comma-separated pair consisting of
'Acceleration' and one of the following:
'auto' — Automatically apply a number of optimizations
suitable for the input network and hardware resource.
'mex' — Compile and execute a MEX function. This option
is available when using a GPU only. Using a GPU requires Parallel
Computing Toolbox and a CUDA enabled NVIDIA GPU with compute capability 3.0 or higher. If
Computing Toolbox or a suitable GPU is not available, then the function returns an
'none' — Disable all acceleration.
The default option is
specified, MATLAB® applies a number of compatible optimizations. If you use the
'auto' option, MATLAB does not ever generate a MEX function.
'mex' can offer performance benefits, but at the expense of an
increased initial run time. Subsequent calls with compatible parameters are faster.
Use performance optimization when you plan to call the function multiple times using
new input data.
'mex' option generates and executes a MEX function based on
the network and parameters used in the function call. You can have several MEX
functions associated with a single network at one time. Clearing the network variable
also clears any MEX functions associated with that network.
'mex' option is only available for input data specified as
a numeric array, cell array of numeric arrays, table, or image datastore. No other
types of datastore support the
'mex' option is only available when you are using a GPU.
You must also have a C/C++ compiler installed. For setup instructions, see MEX Setup (GPU Coder).
'mex' acceleration does not support all layers. For a list of
supported layers, see Supported Layers (GPU Coder).
bboxes— Location of objects detected
Location of objects detected within the input image or images, returned as an M-by-4 matrix or a B-by-1 cell array. M is the number of bounding boxes in an image, and B is the number of M-by-4 matrices when the input contains an array of images.
Each row of
bboxes contains a four-element vector of the
height]. This vector specifies the upper left corner and size
of that corresponding bounding box in pixels.
scores— Detection scores
Detection confidence scores, returned as an M-by-1 vector or a B-by-1 cell array. M is the number of bounding boxes in an image, and B is the number of M-by-1 vectors when the input contains an array of images. A higher score indicates higher confidence in the detection.
labels— Labels for bounding boxes
Labels for bounding boxes, returned as an M-by-1 categorical array or a
B-by-1 cell array. M is the number of
labels in an image, and B is the number of
M-by-1 categorical arrays when the input contains an
array of images. You define the class names used to label the objects when you
train the input
detectionResults— Detection results
Detection results, returned as a 3-column table with variable names, Boxes, Scores, and Labels. The Boxes column contains M-by-4 matrices, of M bounding boxes for the objects found in the image. Each row contains a bounding box as a 4-element vector in the format [x,y,width,height]. The format specifies the upper-left corner location and size in pixels of the bounding box in the corresponding image.
By default, the
detect function preprocesses
the test image for object detection by:
Resizing it to a nearest possible image size used for training the YOLO v2
network. The function determines the nearest possible image size from the
TrainingImageSize property of the
Normalizing its pixel values to lie in same range as that of the images used to
train the YOLO v2 object detector. For example, if the detector was trained on
uint8 images, the test image must also have pixel values in the
range [0, 255]. Otherwise, use the
function to rescale the pixel values in the test image.