Projection of LSTM layer vs GRU layer

Question

Silvia on 28 May 2024

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/2123176-projection-of-lstm-layer-vs-gru-layer

Commented: Silvia on 10 Jun 2024

I am training two RNNs, one with a LSTM layer and the other one with a GRU layer. The two architectures are the following:

numFeatures = 1;        
numHiddenUnits = 32;
layersLSTM = [
    sequenceInputLayer(numFeatures)
    lstmLayer(numHiddenUnits, OutputMode="sequence")
    fullyConnectedLayer(numFeatures)
    ];
layersGRU = [
    sequenceInputLayer(numFeatures)
    gruLayer(numHiddenUnits, OutputMode="sequence")
    fullyConnectedLayer(numFeatures)
    ];

I am using the examples for the Projection at these links: https://it.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.lstmprojectedlayer.html and https://it.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.gruprojectedlayer.html

Using the GRU architecture and training the projected model, the Validation RMSE and Loss do not follow the Training RMSE and Loss as shown in the image below:

It's the first time that this happens. For the LSTM NN I've never had this problem (both for the architecture with LSTM layer and the one with LSTM projected layer), and also training the GRU NN model without projection I didn't have this problem. The validation could follow the metrics properly. What could this problem be due to?

I have also a second question:

Following the two examples in Matlab I set the parameters of outputProjectorSize and inputProjectorSize to:

75% of the number of Hidden Units and 25% of the Input size respectively for LSTM
25% of the number of Hidden Units and 75% of the Input size respectively for GRU

So, for the GRU it's the opposite. Is there a reason behind this choise?

Thank you in advance!

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Maksym Tymchenko on 3 Jun 2024

1
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/2123176-projection-of-lstm-layer-vs-gru-layer#answer_1466876

Hi @Silvia,

I am glad to see that you are using our new projection features.

I'll start by answering the second question.

From what I see, both examples are using the exact same definition for OutputProjectorSize and InputProjectorSize in the section "Compare Network Projection Sizes":

An output projector size of 25% of the number of hidden units.
An input projector size of 75% of the input size.

These are reasonable parameter sizes to choose because they result in the lstmProjectedLayer having fewer learnable parameters compared to an lstmLayer with the same number of hidden units. Note that it is possible to choose values that will result in a projected layer being larger than the original layer. To avoid this, use the function compressNetworkUsingProjection which will determine these parameters sizes automatically based on the desired amount of compression specified.

Alternatively, if you want to create the projected layers from scratch, follow the Tips in the description of the the OutputProjectorSize and InputProjectorSize parameters. These say that, to ensure that the projected layer requires fewer learnable parameters than the corresponding non-projected layer:

For an lstmProjectedLayer: set the OutputProjectorSize property to a value less than 4*NumHiddenUnits/5, and set the InputProjectorSize property to a value less than 4*NumHiddenUnits*inputSize/(4*NumHiddenUnits+inputSize)
For a gruProjectedLayer: set the OutputProjectorSize property to a value less than 3*NumHiddenUnits/4, and set the InputProjectorSize property to a value less than 3*NumHiddenUnits*inputSize/(3*NumHiddenUnits+inputSize)

These formulas can be derived by expressing the total number of learnable parameters as a function of the number of hidden units and the input size. For more information, see the algorithms section of the pages lstmProjectedLayer and gruProjectedLayer.

Regarding your first question, I would need the full reproduction steps, including the script and dataset used, in order to investigate what the issue is. Please feel free to share these as an attachment to this post. Or alternatively, you can open a technical support request with the reproduction steps.

1 Comment
Show -1 older commentsHide -1 older comments

Silvia on 10 Jun 2024

Hello @Maksym Tymchenko,

Thank you for the detailed explanations and the interesting insight into the compressNetworkUsingProjection function!

Unfortunately, as far as the codes and datasets are concerned, I cannot share anything for reasons of data privacy.

But thank you again for your help!

Silvia

Sign in to comment.

Projection of LSTM layer vs GRU layer

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Projection of LSTM layer vs GRU layer

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments