Main Content

Get Started with Segment Anything Model for Image Segmentation

Perform image segmentation using the Image Processing Toolbox™ Model for Segment Anything Model support package. Using the support package, you can perform image segmentation on a data set by using the pretrained Segment Anything Model (SAM), or perform downstream tasks such as instance segmentation by passing the output of an object detector network as an input to the SAM. To learn more about the model and the training data, see the SA-1B Dataset page on the Meta website.

The SAM is a zero-shot image segmentation model that uses deep learning neural networks to accurately segment objects within images without requiring training. SAM allows you to actively guide and refine segmentation by providing feedback through visual prompts, such as points, boxes, and mask logits.

SAM uses visual prompts visual prompts, such as points, boxes, and masks, to interactively produce accurate segmentation.

The SAM architecture consists of an image and visual prompt encoder and a mask decoder. This enables you to reuse the same image embeddings with different visual prompts. For a given image embedding, the image encoder and mask decoder use the visual prompt to predict a mask. Because the SAM enables you to predict multiple masks for a single prompt, you can use the SAM to segment ambiguous entities, such as both a person and the shirt they wear.

Install Support Package

You can install the Image Processing Toolbox Model for Segment Anything Model from the Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons. The support package also requires Deep Learning Toolbox™. Processing image data on a GPU requires a supported GPU device and Parallel Computing Toolbox™.

Apply Pretrained Segment Anything Model

Use this process to segment a test image using a pretrained SAM.

  1. Load an image to segment into the workspace, and return the image size. The SAM supports only RGB images.

    I = imread("peppers.png");
    imageSize = size(I);
    

    For best model performance, use an image with a data range of [0, 255], such as one with a uint8 data type. If your input image has a larger data range, rescale the range of pixel values using the rescale function:

    I = 255.*rescale(I)
  2. Create a segmentAnythingModel object to configure a pretrained SAM.

    model = segmentAnythingModel;
  3. Extract the feature embeddings of your image by using the extractEmbeddings object function.

    embeddings = extractEmbeddings(model,I);
  4. Specify the visual prompts as foreground and background point coordinates.

    pointPrompt = [453 283; 496 288; 504 300];
    backgroundPoints = [308 272; 348 176];
  5. Segment the object defined by the foreground and background points in the image by using the segmentObjectsFromEmbeddings object function.

    masks = segmentObjectsFromEmbeddings(model,embeddings,imageSize, ...
        ForegroundPoints=pointPrompt,BackgroundPoints=backgroundPoints);
  6. Overlay the detected object mask on the input image, and display the image with its object mask.

    imMask = insertObjectMask(I,masks);
    imshow(imMask)

    Segmentation of an object in image using SAM

You can use this approach to segment multiple objects in an image, one a time, using an interactive user interface with SAM. For a detailed example, see Interactively Segment Image Using Segment Anything Model.

Refine Segmentation Results

To refine segmentation results, use the segmentObjectsFromEmbeddings function on the same image, but provide the mask logits of the object mask from the previous segmentation as an additional visual prompt input by specifying them to the MaskLogits name-value argument. The mask logits returned in the maskLogits argument of segmentObjectsFromEmbeddings function are non-thresholded mask logits, instead of binary masks. If you specify SAM to return multiple masks using the ReturnMultiMask argument, the model returns the mask logits corresponding to only the mask with the highest confidence score. The mask logits refinement process enables you to iteratively tune your image segmentation.

This image shows segmentation masks predicted using a SAM at two stages, before and after refinement. For both stages, the same visual prompts have been specified as foreground and background points. The image in the second stage has been refined using the mask logits returned in the first stage of the model as an additional prompt.

Refining segmentation masks produced using SAM by passing mask logits as visual prompts in a subsequent iteration of the segmentation

Perform Downstream Tasks Using SAM

The pretrained model has a zero-shot response to any prompt at inference time, enabling you to solve downstream tasks by feeding suitable prompts to the model. You can apply this approach to perform edge detection, segment everything (object proposal generation), and segment detected objects (instance segmentation).

For example, you can employ a SAM in conjunction with an object detector to perform instance segmentation. To do this, use the bounding box output of an object detector, such as yoloxObjectDetector (Computer Vision Toolbox), to specify the visual prompt name-value arguments to the segmentObjectsFromEmbeddings object function.

References

[1] Kirillov, Alexander, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, et al. "Segment Anything," April 5, 2023. https://doi.org/10.48550/arXiv.2304.02643.

See Also

Apps

Functions

Related Topics