How do I use polygon labeling for an instance segmentation neural network?

Question

Alex on 31 Oct 2022

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/1840163-how-do-i-use-polygon-labeling-for-an-instance-segmentation-neural-network

Commented: Alex on 25 Nov 2022

I am trying to perform instance segmentation using a network trained with the trainFasterRCNNObjectDetector function from Computer Vision Toolbox. The end-goal is a network that can find location and size of defects in my images. My data source is 1200x2200 X-ray material images in greyscale. I have 200 images labelled with the Image Labeller app, both in pixel (from a previous Semantic Segmentation Network attempt) and now in Polygon.

As my images are in greyscale, I have opted for not using a pre-trained network, instead of copying the data into RGB channels and trying anyway.

In order to increase the amount of training data, I augment it liberally.

patch = [480 480 1];
%% Make training data
augmenter = imageDataAugmenter( ...
    'RandXReflection', true, ...
    'RandYReflection', true, ...
    'RandRotation', [0 360], ...
    'RandScale', [0.3 1.5], ...
    'RandXShear', [-30 30], ...
    'RandYShear', [-30 30], ...
    'RandXTranslation', [-5 5], ...
    'RandYTranslation', [-5 5]); %Transformations for more training data
batchsize = 16; %How many patches extracted per image, and that neural net uses in Option
patchds = randomPatchExtractionDatastore(imds,pxds,patch(1), ...
     'PatchesPerImage', batchsize, 'DataAugmentation',augmenter);
%% Options
options = trainingOptions("adam", ...
    MaxEpochs=5, ...
    ExecutionEnvironment='parallel',...
    InitialLearnRate=0.0001, ...
    L2Regularization=0.0001, ...
    LearnRateSchedule='piecewise',...
    LearnRateDropPeriod=10,...
    LearnRateDropFactor=0.1,...
    MiniBatchSize=512, ...
    Plots="training-progress", ...
    Shuffle="every-epoch", ...
    ValidationData=valdata, ...
    ValidationFrequency=50,...
    ValidationPatience=inf);

The network I'm using is a modified ResNet_18, created using the Deep Network Designer App.

THE PROBLEMS:

options and patchds worked when using trainNetwork and a Unet-based layer structure (with its own issues), but when I run this:

>> trainFasterRCNNObjectDetector(patchds,lgraph,options);

Error using vision.internal.inputValidation.checkGroundTruthDatastore

The read method of the training input datastore must return an M-by-3 cell or table.

Error in trainFasterRCNNObjectDetector>iParseInputs (line 1094)

params.ClassNames = vision.internal.inputValidation.checkGroundTruthDatastore(trainingDs);

Error in trainFasterRCNNObjectDetector (line 442)

[trainingData, options, params] = iParseInputs(...

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

I went online to find where my issues could be:

https://www.mathworks.com/matlabcentral/answers/493276-how-can-i-export-ground-truth-object-correctly-from-image-labeler-to-m-script-for-objectdetectortra

This answer seemed to indicate polygons just... Would not work.

Staff stated that this now worked, however:

https://www.mathworks.com/matlabcentral/answers/480920-polygon-labelling-from-ground-truth-label-for-traing-rcnn

I followed the lead from there:

https://www.mathworks.com/help/vision/ug/label-objects-using-polygons.html#mw_14c8e6d3-cf8a-4451-a890-458b0b6b7212

Looked very promising, and I landed on

"Postprocess Exported Labels for Instance or Semantic Segmentation Networks

You can use the exported, labeled ground truth for training an instance segmentation network or a semantic segmentation network.

Follow these steps to process the polygon data for either semantic segmentation or instance segmentation." Huzzah!

So I follow these steps.

out = gatherLabelData(gTruth,[labelType.Polygon],'GroupLabelData','LabelType');
imageSize = [480 480];
for q=1:10
    polygons = out{1}.PolygonData{q}(:,1);
    numPolygons = size(polygons,1);
    maskStack = false([imageSize(1:2) numPolygons]);
    for i = 1:numPolygons
        maskStack(:,:,i) = poly2mask(polygons{i}(:,1), ...
                       polygons{i}(:,2),imageSize(1),imageSize(2));
    end
    filename = sprintf('%s_%d','C:\...\PolygonLabelData\mask',q);
    save(filename,"maskStack")
end

And I get a bunch of files with a depth = number of masks. Then... What? Where could I use this output? Any guide I can find just says to get TrainingData as an Mx2 table, with the first collumn being images, and the second being rectangles of interest. combine() isn't useful since my maskStack isn't a datastore.

Error using objectDetectorTrainingData

The input groundTruth object/s do not contain any valid object detector training data.

So my questions

1 Is there a better way; a better method or network, for example?

2 What should I do with the maskStacks in order to teach my net?

3 (bonus) since my idea is to find dark-grey boxes on light-grey backgrounds, it sounds... Hard. But how do I balance training? Any time I do it it either trends to 0 or 100% REAL fast. I've tried a 2,3,4 layer Unet, and the ResNet-18, but I am still not fully grasping it.

Thankful for any input, as I'd rather not have to hand-label 200 images again...