How to use trainNetwork function for input as a video?

I have written code to recognize characters 'A' & 'B'. But during training , i got following error.
Error using trainNetwork
Invalid training data. Predictors must be a numeric array, a datastore, or a table. For networks with sequence input, predictors can also be
a cell array of sequences.
Error in trialjune25_2022 (line 77)
[netLSTM,info] = trainNetwork(imdsTrainData',labelsTrain,layers,options);
____
size(imdsTrainData')=> 37 x 1 cell array
size(labelsTrain)=> 37 x 1 cell array.
Please help to resolve the error.
clear all;
close all;
clc;
idx = 1;
files={'trainsp1_A1.avi';'trainsp1_A2.avi';'trainsp1_A3.avi';'trainsp1_A4.avi';'trainsp1_A5.avi';'trainsp1_A6.avi';'trainsp1_A7.avi'; ...
'trainsp2_A1.avi';'trainsp2_A2.avi';'trainsp2_A3.avi';'trainsp2_A4.avi';'trainsp2_A5.avi';'trainsp2_A6.avi';'trainsp2_A7.avi'; ...
'trainsp3_A1.avi';'trainsp3_A2.avi';'trainsp3_A3.avi';'trainsp3_A4.avi';'trainsp3_A5.avi';'trainsp3_A6.avi';'trainsp3_A7.avi'; ...
'trainsp1_B1.avi';'trainsp1_B2.avi';'trainsp1_B3.avi';'trainsp1_B4.avi';'trainsp1_B5.avi';'trainsp1_B6.avi';'trainsp1_B7.avi'; ...
'trainsp2_B1.avi';'trainsp2_B2.avi';'trainsp2_B3.avi';'trainsp2_B4.avi';'trainsp2_B5.avi';'trainsp2_B6.avi';'trainsp2_B7.avi'; ...
'trainsp3_B1.avi';'trainsp3_B2.avi';'trainsp3_B3.avi';'trainsp3_B4.avi';'trainsp3_B5.avi';'trainsp3_B6.avi';'trainsp3_B7.avi'; ...
};
% labels1=categorical([ones(1,21) 2*ones(1,21)]);% 2*ones(1,8) 3*ones(1,8) 4*ones(1,8) 5*ones(1,7) 6*ones(1,3) 7*ones(1,8) 8*ones(1,6) 9*ones(1,6)]);
% labels=labels1';
labels={1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,...
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2};
numFiles = numel(files)
% sequences = cell(numFiles,1);
for i = 1:numFiles
fprintf("Reading file %d of %d...\n", i, numFiles)
video{i} = readVideo(files{i});
end
numObservations = numel(video);
idx = randperm(numObservations);
N = floor(0.9 * numObservations);
idxTrain = idx(1:N);
imdsTrainData = video(idxTrain);
%imdsTrainData=imdsTrainData1';
labelsTrain1 = labels(idxTrain);
labelsTrain=labelsTrain1';
%imdsTrain={imdsTrainData' labelsTrain};
idxValidation = idx(N+1:end);
imdsValidationData = video(idxValidation);
%imdsValidationData=imdsValidationData1';
labelsValidation1 = labels(idxValidation);
labelsValidation=labelsValidation1';
%imdsValidation={imdsValidationData' labelsValidation};
% %_______________
%imdsTrain=
% imdsValidation
layers = [
imageInputLayer([38 62 3])
convolution2dLayer(3,8,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,16,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,32,'Padding','same')
batchNormalizationLayer
reluLayer
fullyConnectedLayer(2)
softmaxLayer
classificationLayer];
options = trainingOptions('adam', ...
'InitialLearnRate',1e-4, ...
'GradientThreshold',2, ...
'Shuffle','every-epoch', ...
'ValidationData',{imdsValidationData,labelsValidation}, ...
'ValidationFrequency',5, ...
'Plots','training-progress', ...
'Verbose',false);
[netLSTM,info] = trainNetwork(imdsTrainData',labelsTrain,layers,options);

3 Comments

Hello Shilpa
Can you please share the implementation of the 'readVideo' function as it does not seem to be an inbuilt MATLAB function?
Sir,
It is available in online Matlab editor-2022.
unction video = readVideo(filename)
vr = VideoReader(filename);
H = vr.Height;
W = vr.Width;
C = 3;
% Preallocate video array
numFrames = floor(vr.Duration * vr.FrameRate);
video = zeros(H,W,C,numFrames);
% Read frames
i = 0;
while hasFrame(vr)
i = i + 1;
video(:,:,:,i) = readFrame(vr);
end
% Remove unallocated frames
if size(video,4) > i
video(:,:,:,i+1:end) = [];
end
end

Sign in to comment.

Answers (1)

Hello Shilpa
It is my understanding that you want to build a classifier that takes a video input. You are using the 'trainNetwork' function and in trying to do so it is throwing an error.
As the error suggests, trainNetwork only accepts image, sequence or feature data in the form of datastore objects, cell array of numerical arrays or numerical array. The input that you are passing to the function is imdsTrainData that stores frame data and is defined as:
imdsTrainData = video(idxTrain);
As mentioned earlier, imdsTrainData stores instances from the video array that store data from the frames of the different files. To use trainNetwork, you'd need to pass one of the datatypes mentioned above. You can do so by converting your videos to sequences of feature vectors, This can be done by extracting the output of the activations function on the last pooling layer of the GoogLeNet network ("pool5-7x7_s1").
inputSize = netCNN.Layers(1).InputSize(1:2);
layerName = "pool5-7x7_s1";
tempFile = fullfile(tempdir,"hmdb51_org.mat");
if exist(tempFile,'file')
load(tempFile,"sequences")
else
numFiles = numel(files);
sequences = cell(numFiles,1);
for i = 1:numFiles
fprintf("Reading file %d of %d...\n", i, numFiles)
video = readVideo(files(i));
video = centerCrop(video,inputSize);
sequences{i,1} = activations(netCNN,video,layerName,'OutputAs','columns');
end
save(tempFile,"sequences","-v7.3");
end
You can refer the following tutorial to read more about video classification tasks using deep learning in MATLAB: https://www.mathworks.com/help/deeplearning/ug/classify-videos-using-deep-learning.html

2 Comments

Sir/Madam,
I don't want to use pretrain network like GoogLeNet network.
Is there other way to convert video frames to sequences?
Thank you.
Sir/Mam,
I used "activations" function in my program. The error is resolved.
if there is other way to convert video frames to sequences, let me know.
Thank you.

Sign in to comment.

Asked:

on 28 Jun 2022

Commented:

on 2 Jul 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!