classify

Classify image using CLIP network

Since R2026a

collapse all in page

Syntax

classes = classify(clip,I,classNames)

[classes,scores] = classify(clip,I,classNames)

[___] = classify(___,Name=Value)

Description

Add-On Required: This feature requires the Computer Vision Toolbox Model for OpenAI CLIP Network add-on.

classes = classify(clip,I,classNames) assigns each image in I to one of the suggested classes classNames using a Contrastive Language-Image Pre-Training (CLIP) network.

Note

This functionality requires Deep Learning Toolbox™.

example

[classes,scores] = classify(clip,I,classNames) additionally returns the CLIP network prediction scores corresponding to the predicted classes classes.

example

[___] = classify(___,Name=Value) specifies options using one or more name-value arguments in addition to any combination of arguments from previous syntaxes. For example, MiniBatchSize=32 limits the batch size to 32 images.

example

Examples

collapse all

Classify Images in Datastore Using CLIP Network

This example uses:

Open Live Script

Create a pretrained CLIP network.

clip = clipNetwork("vit-b-16");

Create a datastore of test images.

imageFiles = ["kobi.png","baby.jpg","flamingos.jpg","saturn.png"];
imds = imageDatastore(imageFiles);

Define the list of class suggestions for the test images.

classNames = ["baby","dog","flamingo","planet"];

Obtain the predicted classes for each image in the datastore.

class = classify(clip,imds,classNames);

Display the images along with their predicted classes.

figure
tiledlayout(2,2)

for i = 1:length(imageFiles)
    nexttile
    imshow(read(imds))
    title(class(i))
end

Figure contains 4 axes objects. Hidden axes object 1 with title dog contains an object of type image. Hidden axes object 2 with title baby contains an object of type image. Hidden axes object 3 with title flamingo contains an object of type image. Hidden axes object 4 with title planet contains an object of type image.

Classify Image Using CLIP Network

This example uses:

Open Live Script

Create a pretrained CLIP network with a RestNet-50 backbone.

clip = clipNetwork("resnet50")

clip = 
  clipNetwork with properties:

                       ModelName: "resnet50"
             ImageEncoderNetwork: [1×1 dlnetwork]
              TextEncoderNetwork: [1×1 dlnetwork]
    ImageNormalizationStatistics: [1×1 struct]

Load an image that contains the object to classify into the workspace, and display the image.

I = imread("kobi.png");
imshow(I)

Figure contains an axes object. The hidden axes object contains an object of type image.

Define the list of potential classes for the image.

classNames = ["aardvark","bee","cat","dog"];

Obtain the predicted class and prediction scores from the image.

[class,scores] = classify(clip,I,classNames)

class = categorical
     dog

scores = 1×4 single row vector

    0.5309    0.5131    0.5337    0.7217

Classify Image with Custom Descriptions Using CLIP Network

This example uses:

Open Live Script

Create a pretrained CLIP network.

clip = clipNetwork("vit-l-14");

Load a satellite photo of the town of Concord, Massachusetts into the workspace, and display the image.

I = imread("concordaerial.png");
imshow(I)

Figure contains an axes object. The hidden axes object contains an object of type image.

Define the list of class suggestions for the image. These classes are town or city names.

classNames = ["Boston","Concord","Plymouth","Falmouth"];

Define class descriptions that provide more context to the CLIP model for more accurate classification.

classDescriptions = [ ...
    "A satellite photo of Boston, a city in Massachusetts."
    "A satellite photo of Concord, a suburb in Massachusetts."
    "A satellite photo of Plymouth, a town on the coast of Massachusetts."
    "A satellite photo of Falmouth, a town on Cape Cod in Massachusetts."
    ];

Specify the suggested class names for each of the towns, as well as the more detailed class descriptions, to predict the town shown in the image using the CLIP network.

class = classify(clip,I,classNames,ClassDescriptions=classDescriptions)

class = categorical
     Concord

Input Arguments

collapse all

`clip` — CLIP network
`clipNetwork` object

CLIP network, specified as a clipNetwork object.

`I` — Image data
numeric array | datastore | formatted `dlarray` object

Image data, specified in one of these formats:

H-by-W-by-3-by-B numeric array representing a batch of B truecolor images.
H-by-W-by-1-by-B numeric array representing a batch of B grayscale images.
Datastore that reads and returns truecolor images.
Formatted dlarray (Deep Learning Toolbox) object with two spatial dimensions of the format "SSCB". You can specify multiple test images by including a batch dimension.

`classNames` — Names of class suggestions
categorical vector | vector of strings

Names of class suggestions, specified as a vector of strings or a categorical vector. You must specify class names in English using ASCII characters. The function automatically pads or truncates each text input so that it contains exactly 77 tokens.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: classify(clip,I,classNames,MiniBatchSize=32) limits the batch size to 32 images.

`MiniBatchSize` — Size of batches
`8` (default) | positive integer

Size of batches for processing large collections of images, specified as a positive integer. Larger batch sizes reduce processing time, but require more memory.

`ClassDescriptions` — Class descriptions
`"auto"` (default) | C-element string array

Class descriptions used for classification by the CLIP network, specified as a C-element string array. C is the number of classes in classNames. By default, the CLIP model generates class descriptions from the labels specified by the classNames input argument.

Use the ClassDescriptions name-value argument to create custom class descriptions. The classify function pads each description string with zeros or shortens it so that it contains exactly 77 tokens.

`ExecutionEnvironment` — Hardware resource
`"auto"` (default) | `"gpu"` | `"cpu"`

Hardware resource on which to run the detector, specified as "auto", "gpu", or "cpu". The table shows the valid hardware resource values.

Resource	Action
`"auto"`	Use a GPU if it is available. Otherwise, use the CPU.
`"gpu"`	Use the GPU. To use a GPU, you must have Parallel Computing Toolbox™ and a CUDA^® enabled NVIDIA^® GPU. If a suitable GPU is not available, the function returns an error. For information about the supported compute capabilities, see GPU Computing Requirements (Parallel Computing Toolbox).
`"cpu"`	Use the CPU.

Output Arguments

collapse all

`classes` — Predicted classes
B-element categorical vector

Predicted classes, returned as a B-element categorical vector. B is the number of images in the batch.

`scores` — Prediction scores
B-by-C numeric matrix

Prediction scores, returned as a B-by-C numeric matrix. B is the number of images in the batch, and C is the number of suggested classes specified using the classNames input argument.

The classify function computes the scores using the CLIPScore algorithm. For an input image I and associated text T, the algorithm computes the score using the equation

$CLIPScore (I, T) = 2.5 \cdot max (cos (I, T), 0) .$

Version History

Introduced in R2026a

classify

Syntax

Description

Examples

Classify Images in Datastore Using CLIP Network

Classify Image Using CLIP Network

Classify Image with Custom Descriptions Using CLIP Network

Input Arguments

clip — CLIP network clipNetwork object

I — Image data numeric array | datastore | formatted dlarray object

classNames — Names of class suggestions categorical vector | vector of strings

Name-Value Arguments

MiniBatchSize — Size of batches 8 (default) | positive integer

ClassDescriptions — Class descriptions "auto" (default) | C-element string array

ExecutionEnvironment — Hardware resource "auto" (default) | "gpu" | "cpu"

Output Arguments

classes — Predicted classes B-element categorical vector

scores — Prediction scores B-by-C numeric matrix

Version History

See Also

`clip` — CLIP network
`clipNetwork` object

`I` — Image data
numeric array | datastore | formatted `dlarray` object

`classNames` — Names of class suggestions
categorical vector | vector of strings

`MiniBatchSize` — Size of batches
`8` (default) | positive integer

`ClassDescriptions` — Class descriptions
`"auto"` (default) | C-element string array

`ExecutionEnvironment` — Hardware resource
`"auto"` (default) | `"gpu"` | `"cpu"`

`classes` — Predicted classes
B-element categorical vector

`scores` — Prediction scores
B-by-C numeric matrix