Creating an RCNN with two image inputs and a regression output

Question

George Lovell on 10 Dec 2019

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/495982-creating-an-rcnn-with-two-image-inputs-and-a-regression-output

Answered: Kenta on 31 Mar 2020

I'm trying to create an RCNN that compares two image samples (each 40x40x3, in the example below) and then gives a numerical response.

I feel that the two halves of the model need to be segregated until quite a way along and then joined using a cat layer.

The problem I have is that I can't seem to find a way to make the model load two images to two seperate sides of the model.

Mahmoud Afifi's code was a useful introduction to my problem: https://uk.mathworks.com/matlabcentral/answers/369328-how-to-use-multiple-input-layers-in-dag-net-as-shown-in-the-figure#answer_398533

However, this relies upon an input format that doesn't seem to work with a regression output layer, it uses an imageDataStore, which in turn specifies a custom image loader "inRead" < imds = imageDatastore(trainImages.imageFilename, 'LabelSource', 'foldernames','IncludeSubfolders',true,'ReadFcn',@inRead); >

I've tried adapting Mahmoud's code to take a single image and cut it in half and then output the left and right half of the image to the two sides of the RCNN. This seems to work in terms of functionality. But I get an error from the "analyse" button on the Deep Network Designer. I also get an error when I try to train it, see below.

I get the following error:

Error using trainNetwork (line 170)

Invalid network.

Caused by:

Layer 'leftConv': Input size mismatch. Size of input to this layer is different from the expected input size.

Inputs to this layer:

from layer 'leftSplitter' (40×40 output)

Layer 'rightConv': Input size mismatch. Size of input to this layer is different from the expected input size.

Inputs to this layer:

from layer 'rightSplitter' (40×40 output)

As the input layer image is 40x80x3 and the subsequent output of the splitter items is 40x40* matlab seems to 'know' the size of the convolution layer should be, yet it doesn't work, anyone have any suggestions?

There's some demo code below, that creates a smalll model below which should learn to return the difference in mean value between the left and right image. No idea if the architecture is correct for this kind of problem, I just knocked it together to show you the error I'm getting.

Thanks,

George

* not sure what happened to the colour planes? Should it be 40x40x3 is that the problem?

% Some simple code to demonstrate the problem
imageSize = [40 80 3]
input = imageInputLayer(imageSize,'Name','InputLayer','Normalization','zerocenter');
layerL=splittingLayerLeftRight('leftSplitter','left')
layerR=splittingLayerLeftRight('rightSplitter','right')
RtwodConvLayer = convolution2dLayer(5, 32, 'Padding', 2, ...
    'BiasLearnRateFactor', 2, 'Weights', ones([5 5 1 32]),'Name','rightConv','numChannels',1);
LtwodConvLayer = convolution2dLayer(5, 32, 'Padding', 2, ...
    'BiasLearnRateFactor', 2, 'Weights', ones([5 5 1 32]),'Name','leftConv','numChannels',1);
leftCol = [input; layerL;LtwodConvLayer]
rightCol= [layerR;RtwodConvLayer]
% outputlayers
% cat convs
numInputs   = 2;
cat_dim     = 3; %third dimension
cat_Layer   = concatenationLayer(cat_dim,numInputs,'Name','Cat-Layer');
fc1         = fullyConnectedLayer(1024,'Name', 'FC-1');
relu1       = reluLayer('Name','ReLu-FC-1');
dropout1    = dropoutLayer('Name','dropOut-FC-1');
fcWidth=10;
fc          = fullyConnectedLayer(fcWidth,'Name', 'FC-out');
%softmxLayer = softmaxLayer('Name','Softmaxx');
%endLayer    = classificationLayer('Name','outLayer');
rLayer      = regressionLayer("Name","regressionoutput");
outputLayers = [    cat_Layer
    fc1
    relu1
    dropout1
    fc
	rLayer];
%fullModel = [leftCol; outputLayers; rightCol]
layers= layerGraph([leftCol; outputLayers]);
layers= addLayers(layers,rightCol);
% % layers = connectLayers(layers,sprintf('rightConv',...
% %     length(layerDepths),length(layerDepths)),'Cat-Layer/in2');
layers = connectLayers(layers,'rightConv','Cat-Layer/in2');
% connect input to column 2 
layers = connectLayers(layers,'InputLayer','rightSplitter');
tempoutputfolder = tempname;
mkdir(tempoutputfolder)
numImages        = 100;
filenames = {};
imDiffs   = [];
for i=1:numImages   
    leftImg  = rand(40,40,3)+rand;
    rightImg = rand(40,40,3)+rand;
    filenames{i} = [tempoutputfolder filesep 'inputImg_' num2str(i) '.png'];
    imwrite([leftImg rightImg],filenames{i});
    imDiffs(i) = mean(leftImg(:))-mean(rightImg(:));    
end
trainData = table(filenames',imDiffs');
MiniBatchSize= 128;
InitialLearnRate= 1e-3;
LearnRateSchedule= 'piecewise';
LearnRateDropFactor= 0.1;
MaxEpochs = 3000;
LearnRateDropPeriod= 1000;
Verbose= true;
ValidationFrequency=100;
Verbose=false;
Plots='training-progress';
options = trainingOptions('sgdm', ...
    'MiniBatchSize', MiniBatchSize, ...
    'InitialLearnRate', InitialLearnRate, ...
    'LearnRateSchedule', LearnRateSchedule, ...
    'LearnRateDropFactor', LearnRateDropFactor, ...
    'LearnRateDropPeriod', LearnRateDropPeriod, ...
    'MaxEpochs', MaxEpochs, ...
    'Verbose', Verbose, ...
    'Verbose',Verbose,...
    'Plots',Plots);     
rcnn=trainNetwork(trainData,layers,options)
% Modified by George Lovell from original code written by Mahmoud Afifi -- mafifi@eecs.yorku.ca | m.3afifi@gmail.com
% Split an image into left/right halves for output to a network with
% multiple image inputs.
%
% Requires Matlab 2019b or higher
classdef splittingLayerLeftRight < nnet.layer.Layer
    
    properties
        target 
    end
    
    properties (Learnable)
    end
    
    methods
        function layer = splittingLayerLeftRight(name,target)
            layer.Name = name;
            layer.Description = "splittingLayerLeftRight";
            layer.target = target;
        end
        function Z = predict(layer, X)
            imWidth = size(X,2);
            if rem(imWidth,2)~=0
                error('To split an image into two left/right halves it needs to have an even width');
            else
                imHalf=imWidth/2;
            end
            
            switch layer.target
                case 'left'
                    Z = X(:,1:imHalf,1:3); 
                case 'right' 
                    Z = X(:,imHalf+1:end,1:3); 
            end
            %figure;imagesc(Z);
        end
    end
end

2 Comments
Show NoneHide None

George Lovell on 12 Dec 2019

Some progress:

It seem the the size of the input layer [40 80 3] is propogated forward so that the convolution layer is expecting this size, the splitting layer is making the image [40 40 3] so that's why it fails. If I modifiy the splitting layer so that the two halves of the input image are copied to the centre with some padding either side then the model seems to run without error, see below.

So it seems my question is ultimately concerned with how I 'tell' the convolution layer that it's input should be [40 40 3] and not [80 40 30]?

switch layer.target

case 'left'

Z = X; % First copy

Z(:) = 0; % make it a big zero

Z(:,(imHalf/2):(imHalf+(imHalf/2))-1,:) = X(:,1:imHalf,:); % now copy what I want into the middle

case 'right'

Z = X; % First copy

Z(:) = 0; % make it a big zero

Z(:,(imHalf/2):(imHalf+(imHalf/2))-1,:) = X(:,imHalf+1:end,:); % now copy what I want into the middle

end

George Lovell on 12 Dec 2019

I think I've solved it, posting here just in case someone else has a similar problem.

Instead of splitting the input layer using a new layer type, I've cropped the input layer, taking the leftside down one stream and the rightside down the other. To acheive this I have my input layer [128 256 3] and this feeds into a crop2dLayer on each side. The crop2Dlayer needs a reference to know what size it will receive. For this I created a new type of lay which simply cuts in half an input. The halfCropLayer takes an input from the inputLayer and outputs an image that is the left half of the input, it doesn't really matter what this output contains, it's just used as a reference for the array size. This feeds into the reference inputs of the crops.

The crops in the crop2dLayer are custom and set to [1 1] and [129 1]

Then the halfCropLayer looks like this:

% HalfCropLayer

% Requires Matlab 2019b or higher

classdef halfcropLayer < nnet.layer.Layer

properties

target

end

properties (Learnable)

end

methods

function layer = halfcropLayer(name,target)

layer.Name = name;

layer.Description = "halfcropLayer";

if nargin > 1

layer.target = target;

end

function Z = predict(layer, X)

imWidth = size(X,2);

if rem(imWidth,4)~=0

error('To split an image into two left/right halves it needs to have a width that is a multiple of 4');

else

imHalf=imWidth/2;

end

Z = X(:,1:imHalf,:); % now copy what I want into the middle

end

Sign in to comment.

Sign in to answer this question.

Answer 1

Kenta on 31 Mar 2020

1
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/495982-creating-an-rcnn-with-two-image-inputs-and-a-regression-output#answer_423065

As of 2019b, a new system called "custom training loop" which enables you to implement multi-input CNN is available.

For example, you can refer to the example below. As you are trying, you should separate the input images into 2 streams after the input layer in your way, but it seems a little bit complicated to implement. I think the demo below will provide you with some tips for your study.

https://jp.mathworks.com/matlabcentral/fileexchange/74760-image-classification-using-cnn-with-multi-input