Train VAE for RGB image generation

Question

debojit sharma on 17 Jun 2023

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/1984654-train-vae-for-rgb-image-generation

Commented: Ben on 26 Jun 2023

I am trying to implement the code to train VAE for image generation given in the following link using my own dataset of RGB images of size 200*200. https://in.mathworks.com/help/deeplearning/ug/train-a-variational-autoencoder-vae-to-generate-images.html

I am getting the following errors in the Train model part:

The code of VAE in the above link is using MNIST dataset images as input to encoder of VAE and it is being said that the decoder of VAE will output an image of size 28-by-28-by-1. But I am trying to generate RGB image of size 200*200 by training this VAE model given in the link. So, my input image is a RGB image of size 200*200. I am getting the above mentioned error in the train model part. I am not able to resolve these errors. So, somebody please kindly guide me regarding what changes I will have to make in this code so that I can train these VAE model to generate RGB image of size 200*200. I will be thankful to you.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Ben on 23 Jun 2023

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1984654-train-vae-for-rgb-image-generation#answer_1261399

The error is stating that the VAE outputs Y and the training images T are different sizes when you try to compute the mean-squared error mse loss between them.

Note that the VAE output size is determined by both the input image sizes and the layers in the network. I think there are a few things to check first:

Make sure the output of the VAE has the same number of channels as the target images - for the MNIST example this will be 1, for RGB images it would be 3.
Make sure the VAE output has the same height and width as the target images, 200x200. The VAE in the example downsamples the spatial sizes by using Stride=2 in the two convolution layers of the encoder, then upsamples again using Stride=2 with the two transposed convolution layers in the decoder. You have to be careful to ensure the decoder upsamples back to the original image size.
Ensure the custom projectAndReshapeLayer is configured for your encoder latent size - in the example the projectionSize is [7,7,64] but for the same network on 200x200 images I would expect this needs to be [50,50,64].

If you can't get this working could you let us know if you have modified the encoder or decoder layers at all? If not can you ensure that all the images input to the VAE have the same size?

Hope that helps,

Ben

3 Comments
Show 1 older commentHide 1 older comment

debojit sharma on 24 Jun 2023

@Ben sir, I tried my making the changes suggested by you in the code given in link: https://in.mathworks.com/help/deeplearning/ug/train-a-variational-autoencoder-vae-to-generate-images.html

But still I am getting the following errors in the train model part.

Size of all the images input to VAE model is same i.e 200*200*3. I am not able to resolve these errors. So, please kindly guide me regarding what changes I will have to make in this code so that I can train these VAE model to generate RGB image of size 200*200. I will be thankful to you.

I have kept input images of different classes in a folder. Then I am preparing my input image dataset for VAE using the following code:

digitDatasetPath = fullfile(matlabdrive,'Training_sample');

imds = imageDatastore(digitDatasetPath, ...

'IncludeSubfolders',true,'LabelSource','foldernames');

labelCount = countEachLabel(imds)

img = readimage(imds,1);

size(img)

numTrainFiles = .75;

[imdsTrain,imdsValidation] = splitEachLabel(imds,numTrainFiles,'randomize');

[XTrain,tTrain] = imds2cell(imds );

XTrain = cell2mat(reshape(XTrain,1,1,1,[]));

XTrain = XTrain./255;

XTrain = reshape(XTrain,200,200,3,[]);

Thereafter, I made the above mentioned changes in the encoder and decoder part. But still I am getting the same error. Please suggest some solution for this @Ben

Aniketh on 25 Jun 2023

Have you tried printing the dimensions of the arguments being passed to the loss calculator dlfeval(), the upsampling, downsampling and projection corrections pointed out by Ben should solve your issue, however the exact difference in the output dimensions of the layersE and layersD should point you to the correct direction.

Ben on 26 Jun 2023

@debojit sharma - I've written some code showing how this could work for 200x200x3 images. I noticed the main issue I had was that numInputChannels in the example is computed wrong, so perhaps that is the issue you are having. I fixed that in the below:

numLatentChannels = 16;

imageSize = [200 200 3]; % updated for 200x200x3 images

layersE = [

imageInputLayer(imageSize,Normalization="none")

convolution2dLayer(3,32,Padding="same",Stride=2)

reluLayer

convolution2dLayer(3,64,Padding="same",Stride=2)

reluLayer

fullyConnectedLayer(2*numLatentChannels)

samplingLayer];

projectionSize = [50 50 64]; % recomputed manually

numInputChannels = imageSize(3); % fixed from the example.

layersD = [

featureInputLayer(numLatentChannels)

projectAndReshapeLayer(projectionSize)

transposedConv2dLayer(3,64,Cropping="same",Stride=2)

reluLayer

transposedConv2dLayer(3,32,Cropping="same",Stride=2)

reluLayer

transposedConv2dLayer(3,numInputChannels,Cropping="same")

sigmoidLayer];

netE = dlnetwork(layersE);

netD = dlnetwork(layersD);

% Test forward

batchSize = 5;

imageBatch = dlarray(randn([imageSize,batchSize]),"SSCB");

latentBatch = forward(netE,imageBatch);

size(latentBatch)

generatedBatch = forward(netD,latentBatch);

size(generatedBatch)

% Test loss and gradients

if canUseGPU

netE = dlupdate(@gpuArray,netE);

netD = dlupdate(@gpuArray,netD);

imageBatch = gpuArray(imageBatch);

end

[loss,gradE,gradD] = dlfeval(@modelLoss,netE,netD,imageBatch);

function [loss,gradientsE,gradientsD] = modelLoss(netE,netD,X)

% Forward through encoder.

[Z,mu,logSigmaSq] = forward(netE,X);

% Forward through decoder.

Y = forward(netD,Z);

% Calculate loss and gradients.

loss = elboLoss(Y,X,mu,logSigmaSq);

[gradientsE,gradientsD] = dlgradient(loss,netE.Learnables,netD.Learnables);

end

function loss = elboLoss(Y,T,mu,logSigmaSq)

% Reconstruction loss.

reconstructionLoss = mse(Y,T);

% KL divergence.

KL = -0.5 * sum(1 + logSigmaSq - mu.^2 - exp(logSigmaSq),1);

KL = mean(KL);

% Combined loss.

loss = reconstructionLoss + KL;

end

Hope that helps.

Sign in to comment.

Train VAE for RGB image generation

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

3 Comments
Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Community Treasure Hunt

Train VAE for RGB image generation

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

3 Comments Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

3 Comments
Show 1 older commentHide 1 older comment