Extract Training Samples from Big Image

This example shows how to create training samples from labelled big images by using a bigimageDatastore.

This example creates datastores that extract tumor, normal, and background patches from labeled images. These datastores can be used to train, validate, or test a neural network. The individual blocks extracted from the datastores can be written to disk for use later.

Create a bigimage. This examples uses a modified version of image "tumor_091.tif" from the CAMELYON16 data set. The original image is a training image of a lymph node containing tumor tissue. The original image has eight resolution levels, and the finest level has resolution 53760-by-61440. The modified image has only three coarse resolution levels. The spatial referencing of the modified image has been adjusted to enforce a consistent aspect ratio and to register features at each level.

bim = bigimage('tumor_091R.tif');

ROI Label Data

Label data for an image can be specified in different ways. The Camelyon16 data set provides labels of tumor and normal regions as a set of coordinates specifying manually annotated region boundaries. The coordinates are specified with respect to the finest resolution level image. When pixels exist within the boundaries of both a normal region and a tumor region, the correct label for those pixels is normal tissue.

Load label data for the big image. This examples uses a modified version of labels of the "tumor_091.tif" image from the CAMELYON16 data set. The original labels are stored in .XML format. The modified labels are resampled and saved as .MAT files.

roiPoints = load('labelledROIs.mat')
roiPoints = 

  struct with fields:

       cancerRegions: {1×6 cell}
    nonCancerRegions: {[46×2 double]}

Create ROI objects from the coordinates of the boundaries.

tumorPolys = cellfun(@(position) images.roi.Polygon('Position',position),...
normalPolys = cellfun(@(position) images.roi.Polygon('Position',position),...

Specify Label Data Using ROI Objects

Display the image overlaid with the annotated ROIs. The coordinates of the ROIs are in the same coordinate system as the image, so changing the resolution levels of the displayed image still renders the ROIs accurately.

h = bigimageshow(bim);

title(['Resolution Level:' num2str(h.ResolutionLevel)]);

Zoom in to one ROI.

xlim([3940, 4290])
ylim([2680 3010])
title(['Resolution Level:' num2str(h.ResolutionLevel)]);

Specify Label Data as Masks

Another way to specify labels is by using masks. Create a mask at the coarsest resolution level for the stained region, which includes both tumor and normal tissue. The mask is true for pixels whose grayscale value is less than 130. Clean the mask with a call to bwmorph

tissueMask = apply(bim,bim.CoarsestResolutionLevel,@(im)bwmorph(rgb2gray(im)<130, 'close'));

Combine the information in the ROI objects and mask representations to generate individual low resolution masks for three classes: background, normal tissue, and tumor tissue.

Get the spatial referencing and pixel extent of the big image at the coarsest resolution level.

ref = bim.SpatialReferencing(bim.CoarsestResolutionLevel);
pixelExtent = [ref.PixelExtentInWorldX,ref.PixelExtentInWorldY];

Create a writeable bigimage to store each mask. Specify a block size that is large enough to use memory efficiently.

blockSize = [512 512];
bmBackground = bigimage(ref,1,'logical','BlockSize',blockSize);
bmNormal = bigimage(ref,1,'logical','BlockSize',blockSize);
bmTumor = bigimage(ref,1,'logical','BlockSize',blockSize);

Loop through each output bigimage, one block at a time. Determine the label of each block, then set the pixel data of the block accordingly.

If you have Parallel Computing Toolbox™, then you can replace the for statement with a parfor statement to run the loop in parallel.

for cStart = 1:bmBackground.BlockSize(2):ref.ImageSize(2)
    for rStart = 1:bmBackground.BlockSize(1):ref.ImageSize(1)
        % Center of top left pixel of this block in world units
        xyStart = [cStart, rStart].* pixelExtent;

        % Clamp block end edge to end of image
        cEnd = min(cStart + bmBackground.BlockSize(2)-1, ref.ImageSize(2));
        rEnd = min(rStart + bmBackground.BlockSize(1)-1, ref.ImageSize(1));

        % Center of bottom right pixel of this block in world units
        xyEnd = [cEnd, rEnd].* pixelExtent;

        bsize = [rEnd, cEnd] - [rStart, cStart] + 1;

        % Meshgrid in world units for all pixels in this block.
        [xgrid, ygrid] = meshgrid(xyStart(1):ref.PixelExtentInWorldX:xyEnd(1),...

        % Start with the tissue mask:
        % 0 is background, 1 is foreground
        tmask = getRegion(tissueMask,1,xyStart,xyEnd);

        % Create the tumor mask
        cmask = false(bsize);
        for ind = 1:numel(roiPoints.cancerRegions)
            vertices = roiPoints.cancerRegions{ind};
            isTumor = inpolygon(xgrid, ygrid,...
                vertices(:,1), vertices(:,2));
            cmask = cmask|isTumor;

        % Carve out the healthy part from this mask
        for ind = 1:numel(roiPoints.nonCancerRegions)
            vertices = roiPoints.nonCancerRegions{ind};
            isHealthy = inpolygon(xgrid,ygrid,...
            cmask = cmask & ~isHealthy;

        % Healthy tissue is 'tissue' and 'not tumor'
        hmask = tmask==1 & cmask==0;
        % Write the blocks out

        % Write out the background mask
        bmask = tmask==0;

        % Healthy and tumor tissue are not part of background
        bmask = bmask & ~cmask & ~hmask;
        setBlock(bmBackground, 1, xyStart, bmask);

Display the masks.

title('Full Image')
title('Normal Tissue');
title('Tumor Tissue')

Zoom in to a region of interest.


Save Masks for Use Later

For most datasets, these masks can be created once upfront and then reused for multiple training sessions. These masks are currently backed by temporary files that do not exist across MATLAB® sessions.

Write them to a persistent location.

imageDir = 'Camelyon16_maskImage91';
if exist(imageDir,'dir')
    rmdir Camelyon16_maskImage91 s;

backgroundDir = fullfile(imageDir,'background');
tumorDir = fullfile(imageDir,'tumor');
normalDir = fullfile(imageDir,'normal');


In a fresh session of MATLAB, you can reload the masks by creating new |bigimage|s.

bmBackground = bigimage(backgroundDir);
bmTumor = bigimage(tumorDir);
bmNormal = bigimage(normalDir);

Visualize Potential Candidate Blocks

You can use masks with bigimageDatastore to extract image blocks only where the mask is true. If a mask block is true, then the datastore reads the entire image block. However, declaring a mask block as true is ambiguous because each block of the mask can contain a mixture of true (ROI) and false (background) pixels. The InclusionThreshold property of bigimageDatastore removes the ambiguity by specifying the minimum percentage of true pixels for the mask block to be considered true.

Display the mask of tumor tissue over the image. Highlight blocks within the ROI with the color green. By default, the inclusion threshold is 0.5, so a mask block is considered true only if it has a minimum of 50% true pixels.

blockSize = [256 256];

h = bigimageshow(bim);

To increase number of included blocks, decrease the inclusion threshold.


Create bigimageDatastore

Create a bigimageDatastore that extracts patches from tumor regions at the finest resolution level. Tumor regions are sparse so use a small inclusion threshold. In this example, the datastore returns blocks that contain at least 10% tumor pixels.

You can also generate more patches by oversampling tumor regions. To oversample, specify the BlockOffsets property as a [numrows numcols] vector smaller than the block size.

bimdsTumor = bigimageDatastore(bim,1,'BlockSize',blockSize, ...

Create a bigimageDatastore that extracts patches from normal regions at the finest resolution level. Use showmask to determine an appropriate inclusion threshold for normal tissue.

h = bigimageshow(bim);

Normal tissue is more prevalent than tumor tissue. The default inclusion threshold will yield sufficient samples of normal tissue. Oversampling is not necessary so keep the default value of BlockOffsets.

bimdsNormal = bigimageDatastore(bim,1,'BlockSize',blockSize, ...

Create a bigimageDatastore that extracts patches from the background at the finest resolution level. The background cover a large portion of the image, so select only patches that consist entirely of background pixels.

bimdsBackground = bigimageDatastore(bim,1,'BlockSize',blockSize, ...

Read Training Patches from bigimageDatastore

By default, a bigimageDatastore reads one patch at a time. You can increase the number of patches returned in each call to the read function by specifying the ReadSize property.

bimdsBackground.ReadSize = 6;
bimdsNormal.ReadSize = 6;
bimdsTumor.ReadSize = 6;

Visualize one batch of data read from each of the datastores. Notice that patches of normal tissue appear to come from adjacent blocks, as expected from the default value of the BlockOffsets property. Similarly, the patches of tumor tissue have some overlap of image content, as expected because the block offset is smaller than the block size.

montage(read(bimdsBackground),'Size',[1 6],'BorderSize',[5 5])

montage(read(bimdsNormal),'Size',[1 6],'BorderSize',[5 5])

montage(read(bimdsTumor),'Size',[1 6],'BorderSize', [5 5])

See Also

| |

Related Topics