Get Started with Segment Anything Model for Image Segmentation

Perform image segmentation using the Image Processing Toolbox™ Model for Segment Anything Model support package. The Segment Anything Model (SAM) is a state-of-the-art image segmentation model that uses deep learning neural networks to accurately and automatically segment objects within images without requiring training. The SAM enables you to instantaneously segment objects by providing feedback through visual prompts, such as marked points, selected ROI, and existing masks. Because the SAM predicts multiple masks for a single prompt, such as a marked point, you can use the SAM to segment ambiguous entities, such as both a person and the shirt they wear.

SAM uses visual prompts, such as points, boxes, and masks, to interactively produce accurate segmentation.

Using the support package, you can perform various image segmentation tasks using the SAM. This table describes the segmentation options and the corresponding functionality available in Image Processing Toolbox.

Goal	Approach	Get Started
Interactively and instantaneously segment objects or regions in images: Mark points or select an ROI on the image to segment objects and produce binary masks Segment the entire image into object regions, and produce binary masks from selected objects	Segment Anything Model in the Image Segmenter app	To get started, see the Interactively Segment Objects in Image section. For an example, see Segment Objects Using Segment Anything Model (SAM) in Image Segmenter.
Automatically segment the entire image or all of the objects inside an ROI, and produce a stack of masks without the need to specify prompts	`imsegsam`	To get started, see the Automatically Segment Full Image or ROI section.
Segment an image or batch of images using custom visual prompts, such as points, bounding boxes, and mask logits	`segmentAnythingModel` `extractEmbeddings` `segmentObjectsFromEmbeddings`	To get started, see the Segment Image Using Custom Visual Prompts section.

Install Support Package

You can install the Image Processing Toolbox Model for Segment Anything Model from the Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons. The support package also requires Deep Learning Toolbox™. Processing image data on a GPU requires a supported GPU device and Parallel Computing Toolbox™.

Interactively Segment Objects in Image

Interactively segment objects in an image using SAM in the Image Segmenter app. You can instantaneously produce binary object masks, or fine-tune the masked regions.

The Image Segmenter app enables you to perform various segmentation tasks by using the SAM.

Automatically create object masks by marking points or selecting an ROI.
Post-process existing masks: refine existing masks using the SAM, or create new masks using the SAM, and add them to existing binary masks.
Automatically segment the entire image into objects and regions, and select objects or regions to add to a mask. You can interactively specify the segmentation parameters. This image shows the segmentation of the entire image into regions and the masks, shaded in yellow, created by selecting several of the segmented regions.

Use SAM in the Image Segmenter app to interactively segment objects from an image as binary masks. Using the Show Region Boundaries tool, you can automatically segment the entire image and select objects to include in the binary mask.

For an example of how to use the SAM in the Image Segmenter app, see the Segment Objects Using Segment Anything Model (SAM) in Image Segmenter example.

To get started with the Image Segmenter App, see Getting Started with Image Segmenter.

Automatically Segment Full Image or ROI

Use the imsegsam function to automatically segment all the objects or regions in an image, and return masks as connected component structures along with corresponding confidence scores. The imsegsam function enables you to perform these common segmentation tasks.

Segment the entire image into all object regions, and optionally tune segmentation parameters such as the confidence score threshold by specifying name-value arguments.
Segment all objects in an ROI by specifying an ROI using the PointGridMask name-value argument.
Segment all objects of a particular size range by specifying the maximum and minimum size of the objects to segment using the MaxObjectArea and MinObjectArea name-value arguments, respectively.

To convert the imsegsam output from a connected component structure to a stack of binary masks, see the Segment Objects as Mask Stack Using Segment Anything Model example. Alternatively, to visualize the imsegsam output as a label matrix, see the Automatically Segment Full Image Using Segment Anything Model example.

This image shows a visualization of full image segmentation using the label matrix created from the masks output of imsegsam.

Use SAM to automatically segment the entire image.

Segment Image Using Custom Visual Prompts

Segment an image or batch of images by specifying custom visual prompts, such as points or bounding boxes, to the SAM architecture using the segmentAnythingModel object and its object functions. The SAM architecture consists of an image and visual prompt encoder and a mask decoder. This enables you to reuse the same image embeddings with different visual prompts. For a given image embedding, the image encoder and mask decoder use the visual prompt to predict a mask.

To specify custom visual prompts to the SAM, first create a pretrained SAM using the segmentAnythingModel object. Next, use the extractEmbeddings object function to extract the image embeddings from the SAM image encoder.

To perform the segmentation, use the segmentObjectsFromEmbeddings object function to specify visual prompts and segment objects from the image embeddings using the mask decoder.

For an example, see the Segment Objects in Interactive ROI Using Segment Anything Model example.

Refine Segmentation Results

To refine segmentation results, use the segmentObjectsFromEmbeddings object function on the same image, but provide the mask logits of the object mask from the previous segmentation as an additional visual prompt input by specifying them to the MaskLogits name-value argument. The mask logits returned in the maskLogits argument of segmentObjectsFromEmbeddings function are non-thresholded mask logits, instead of binary masks. If you specify for the function to return multiple masks using the ReturnMultiMask argument, the model returns the mask logits corresponding to only the mask with the highest confidence score. The mask logits refinement process enables you to iteratively tune your image segmentation.

This image shows segmentation masks predicted using the SAM at two stages, before and after refinement. For both stages, the same visual prompts have been specified as foreground and background points. The image in the second stage has been refined using the mask logits returned in the first stage of the model as an additional prompt.

Refining segmentation masks produced using SAM by passing mask logits as visual prompts in a subsequent iteration of the segmentation

Perform Upstream and Downstream Tasks Using SAM

Use SAM-based techniques as initial or downstream steps in image processing and deep learning tasks. These are examples of applications where you can use a SAM-based segmentation step.

Segment detected objects: perform instance segmentation by specifying the bounding boxes generated by an object detection network as visual prompt inputs to the SAM. For example, detect objects and create bounding boxes using the detect (Computer Vision Toolbox) object function of the yolov4ObjectDetector (Computer Vision Toolbox) object, and specify them as the boxPrompt input argument to the segmentObjectsFromEmbeddings function.
Create labeled ground truth masks for deep learning applications using the SAM. Use the Segment Anything tool in the Image Labeler (Computer Vision Toolbox) app to automatically segment and label objects as ground truth for semantic segmentation. To learn more about how to create pixel labels using the SAM, see the Automatically Label Ground Truth Using Segment Anything Model (Computer Vision Toolbox) example.
Perform unsupervised edge detection: use the automatic mask generation capability of the SAM to develop a custom method which identifies the object and region boundaries within an image.

References

[1] Kirillov, Alexander, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, et al. "Segment Anything," April 5, 2023. https://doi.org/10.48550/arXiv.2304.02643.

[2] Accessed April 28, 2024. "SA-1B Dataset," https://ai.meta.com/datasets/segment-anything.