Image Category Classification
Image category classification tools in Computer Vision Toolbox™ enable you to classify images into predefined categories using either deep learning-based vision transformer models or traditional bag-of-visual-words techniques. Image category classification capability is essential for applications such as scene recognition, content filtering, and automated tagging. You can start by creating labeled data sets using the Image Labeler and Video Labeler apps, which support interactive and AI-assisted annotation of scene-level labels for images and video frames, respectively. These labels serve as ground truth for training and evaluating image classification models.
For deep learning-based classification, the toolbox provides access to
pretrained vision transformer (ViT) models through the visionTransformer function. These models use self-attention
mechanisms to capture global image context, and can be fine-tuned for custom
data sets. Supporting layers such as patchEmbeddingLayer enable you to design and extend ViT
architectures. Additionally, the toolbox includes support for CLIP networks,
which combine vision and language understanding to perform image classification.
Use the clipNetwork object and the classify object function to perform image classification tasks
that align visual content with textual descriptions, enabling multimodal
applications.
For traditional approaches, the toolbox supports the bag-of-features (BoF)
framework, which represents images as histograms of visual word occurrences. You
can use the bagOfFeatures object to extract
features and build a visual vocabulary, then train classifiers using the
trainImageCategoryClassifier
function and make predictions with the imageCategoryClassifier function.
This method is particularly useful for lightweight applications, or when
interpretability is a priority. For more information, see Image Classification with Bag of Visual Words.
Apps
| Image Labeler | Label images for computer vision applications |
| Video Labeler | Label video for computer vision applications |
Functions
Topics
Create Ground Truth for Image Classification
- Get Started with the Image Labeler
Interactively label rectangular ROIs for object detection, pixels for semantic segmentation, polygons for instance segmentation, and scenes for image classification. - Get Started with the Video Labeler
Interactively label rectangular ROIs for object detection, pixels for semantic segmentation, polygons for instance segmentation, and scenes for image classification in a video or image sequence.
Classify Images Using Deep learning Models
- Train Vision Transformer Network for Image Classification
This example shows how to fine-tune a pretrained vision transformer (ViT) neural network to perform classification on a new collection of images. - Create Simple Image Classification Network (Deep Learning Toolbox)
This example shows how to create and train a simple convolutional neural network for deep learning classification. - Get Started with Image Classification (Deep Learning Toolbox)
This example shows how to create a simple convolutional neural network for deep learning classification using the Deep Network Designer app.
Classify Images Using Bag of Features Approach
- Create a Custom Feature Extractor
You can use the bag-of-features (BoF) framework with many different types of image features. - Image Classification with Bag of Visual Words
Use the Computer Vision Toolbox functions for image category classification by creating a bag of visual words.


