LSTM as a Dynamical System
Updated 4 Apr 2023
LSTM as a dynamical system
This example explains how an LSTM (Long Short-Term Memory) layer can be seen as a nonlinear dynamical system with a specific structure, and sheds some light on whether, and how, an LSTM layer can approximate a linear dynamical system.
In the first part of the example, a network containing a single LSTM layer is defined and simulated using both the Deep Learning Toolbox™ and Simulink®. The Simulink representation, being more visual, clearly shows the two underlying feedback loops at the heart of the LSTM layer. The fact that the number of parameters of an LSTM scales quadratically with the number of units is also highlighted and explained here.
The second part of the example asks (and answers) the question of how well an LSTM layer can approximate a dynamical system.
Here, given a simple discete-time linear dynamical system in which the A matrix is invertible, a set of LSTM weights is first calculated directly from the true matrices of the system. With such weights, the LSTM network reproduces the behavior of the original linear system to any desired accuracy. This fact is explained in theory and illustated in practice.
A natural question is then what happens when the network is trained starting directly from the calculated weights. The related analysis shows that the calculated solution lies in a very narrow attraction basin, from which the training algorithm can very easily get out even with relatively small learning rates.
The final, and most important, question is whether, when starting from random initial weights, the training algorithm can find a solution that is close enough to the one calculated from the true system matrices.
The answer seems to be negative: the training algorithm ends up in a local minimum that is very dependent on both the initial conditions and the hyperparameters. This solution is also quite different from (and does not perform as well as) the solution previously calculated from the system matrices.
Note that this example is meant to illustrate the stucture of an LSTM and some of its theoratical capabilities and limitations. If you want to quickly build and train an LSTM in MATLAB® see LSTM Networks. Also, for a complementary example that uses a network with two LSTM layers with several thousands of parameters for system identification, see Use LSTM Network for Linear System Identification.
Open LSTM_explained.mlx in MATLAB (version R2023a and above) and follow from there. Running the file section by section is suggested.
MATLAB Deep Learning Toolbox
Note: The initial part of the example uses Simulink, but you can follow and execute through most of the example also without it.
- LSTM_explained.mlx - Main file
- lstm_sim.mdl - Simulink model of an LSTM layer
Giampiero Campa - April 2023
Giampiero Campa (2023). LSTM as a Dynamical System (https://github.com/giampy1969/lstm-as-a-dynamical-system/releases/tag/v2.1), GitHub. Retrieved .
MATLAB Release Compatibility
Platform CompatibilityWindows macOS Linux
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!Start Hunting!
Discover Live Editor
Create scripts with code, output, and formatted text in a single executable document.
See release notes for this release on GitHub: https://github.com/giampy1969/lstm-as-a-dynamical-system/releases/tag/v2.1
See release notes for this release on GitHub: https://github.com/giampy1969/lstm-as-a-dynamical-system/releases/tag/v2.0