Combining Feature and Sequence Data in Datastores

Question

Grant Ashby on 12 Mar 2024

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/2093601-combining-feature-and-sequence-data-in-datastores

Commented: Grant Ashby on 19 Mar 2024

Example of Datastores.zip

Hello, I am trying to use some deep learning to predict data classification that is output from an image processing algorithm my lab uses. I do not necessarily want to use an image processing network (I'm a novice at DL and the image sequences received are movies that are 512 pixels x 512 pixels x 3 channels x 302 time points, and these contain hundreds-thousands of small events I want to analyze). So I opted to just use deep learning on the output from the processing algorithm.

My data comes back with a single value through time for each channel (3x302) and that is loaded into the sequence data (input). I want to use an LSTM to analyze these in tandem. In order to relate these to feature data that the image processing algorithm outputs (holistic measurements that corresponds to the event studied in all 3 channels) I get a 1x13 vector used in the feature input. To reduce dimensionality so these can be related I have used the 'Last' option from the LSTM so that the concatenation works, and I set the concatenation dimensionality to '1'. Matlab's Deep Learning Network Analysis detects no issues in the network (see below).

In order to load multiple inputs into the network, I know I will need to use dataStores. Since these are output through the image processing algorithm, and are already in the workspace after some pre-processing. I have opted to use arrayDatastores. I have provided example datasets with random numbers below.

When I try to train my network using the trainNetwork function. I get the following error:

"Error using trainNetwork

Error during read from datastore.

Caused by:

Error using horzcat

Dimensions of arrays being concatenated are not consistent.

Error in matlab.io.datastore.CombinedDatastore/read (line 146)

data = horzcat(data{:});"

I can run a network on the sequence data, and the feature data individually, which makes me think I am having issues with the datastores. I thought the issue might be with the sequence data loading, so I read that padding might be required and a transformation on the sequence datastore might be needed. However, if I try to "readall" after performing the following transformation I get this error:

transformed_datastore_sequence = transform(datastore_sequence,@(x) padsequences(x,2,'Direction', 'both', 'length', 20, 'PaddingValue', 'symmetric', 'UniformOutput', false));

readall(transformed_datastore_sequence)

Invalid transform function defined on datastore.

The cause of the error was:

Error using padsequences

Input sequences must be numeric or categorical arrays.

Error in @(x)padsequences(x,2,'Direction','both','length',20,'PaddingValue','symmetric','UniformOutput',false)

Error in matlab.io.datastore.TransformedDatastore/applyTransforms (line 723)

data = ds.Transforms{ii}(data);

Error in matlab.io.datastore.TransformedDatastore/read (line 235)

[data, info] = ds.applyTransforms(data, info);

Error in matlab.io.datastore.TransformedDatastore/readall (line 300)

data{end+1} = read(copyds); %#ok<AGROW>

Any guidance would be appreciated. I think it's close, I just may not fully understand how the preprocessing set-up is required before combining the data into a datastore.

Below are my layer definitions, and options used for the trainNetwork function, for reference I am running MATLAB R2023b:

tempLayers = [

sequenceInputLayer(3,"Name","input")

lstmLayer(256,"Name","lstm","OutputMode","last")

reluLayer("Name","relu")

fullyConnectedLayer(180,"Name","fc_2")

fullyConnectedLayer(13,"Name","fc")

flattenLayer("Name","flatten")];

lgraph = addLayers(lgraph,tempLayers);

tempLayers = [featureInputLayer(13,"Name","featureinput")

fullyConnectedLayer(13,"Name","fc_3")];

lgraph = addLayers(lgraph,tempLayers);

tempLayers = [

concatenationLayer(1,2,"Name","concat")

fullyConnectedLayer(3,"Name","fc_1")

softmaxLayer("Name","softmax")

classificationLayer("Name","classification")];

lgraph = addLayers(lgraph,tempLayers);

clear tempLayers;

lgraph = connectLayers(lgraph,"flatten","concat/in1");

lgraph = connectLayers(lgraph,"fc_3","concat/in2");

my_options = trainingOptions('adam',...

'MaxEpochs', 12,...

'MiniBatchSize', 300,...

'SequencePaddingValue', 5,...

'ExecutionEnvironment','gpu',...

'shuffle', 'every-epoch',...

'InitialLearnRate', 0.01,...

'LearnRateSchedule','piecewise',...

'LearnRateDropFactor', 0.2,...

'LearnRateDropPeriod', 5,...

'Verbose', true,...

'Plots', 'training-progress');

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Avadhoot on 19 Mar 2024

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/2093601-combining-feature-and-sequence-data-in-datastores#answer_1427421

Open in MATLAB Online

Hi @Grant Ashby,

From your description I can gather that there are a few issues with the preprocessing and the concatenation step in your code. Let us look into them step-by-step.

Part 1: Data preparation and Datastore issues:

The first error message indicates that there is an issue during the concatenation step caused by dimension mismatch between the arrays to be concatenated. To solve that, you tried to implement padding using "padsequences". But "padsequences" requires the input to be a cell vector of numeric or categorical arrays. The second error you face is the result of this constraint. To solve this please ensure that the input data is in the proper format. If it is not then you will need to transform the data before passing it to the "padsequences" function.

Part 2: Applying transforms:

When you apply a transform to the function, ensure every element read from the datastore is in the correct format. The correct way to apply transformation would be as follows:

transformed_datastore_sequence = transform(datastore_sequence, @(x) padsequences(x, 2, 'Direction', 'both', 'Length', 302, 'PaddingValue', 'symmetric'));

Ensure that the "length" parameter corresponds to the expected length of the sequence after padding.

Also while combining the datastores, ensure that each part produces batches of compatible sizes. This might require the use of a custom transformation to match the sequence data batch structure to the feature data structure.

Part 3: Training the network:

The network structure is correct as specified by the Deep Learning Network Analysis. Just ensure that the datastore outputs the data in the correct format for each batch. i.e {sequenceBatchData, featureBatchData}.

After each preprocessing step use "read" or "readall" to ensure the data is in the correct format. For more information on the following functions, refer to the documentation below:

"padsequences" function: https://www.mathworks.com/help/deeplearning/ref/padsequences.html
"read" function: https://www.mathworks.com/help/matlab/ref/matlab.io.datastore.read.html
"readall" function: https://www.mathworks.com/help/matlab/ref/matlab.io.datastore.readall.html

I hope this proves helpful.

1 Comment
Show -1 older commentsHide -1 older comments

Grant Ashby on 19 Mar 2024

This was definitely useful. Once I got padsequences working correctly the network loads correctly, and now I can fine tune it. Thank you for your help!

Sign in to comment.

Combining Feature and Sequence Data in Datastores

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Combining Feature and Sequence Data in Datastores

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments