Pruning, Projection, and Quantization

Compress deep neural networks, reduce network memory, and prepare network for code generation

Use Deep Learning Toolbox™ together with the Deep Learning Toolbox Model Compression Library support package to reduce the memory footprint and computational requirements of a deep neural network:

Prune filters from convolution layers by using first-order Taylor approximation.
Project layers by performing principal component analysis (PCA) on the layer activations.
Quantize the weights, biases, and activations of layers to reduced precision scaled integer data types.

You can then generate code from the compressed network to deploy to your desired hardware.

Highlighted Links

Reduce Memory Footprint of Deep Neural Networks

Simplified illustration of compression. On the left is a sketch of a large neural network with a label indicating the network is 20 MB. An arrow points to a second sketch on the right, which shows a smaller model inside a box. A label indicates the smaller network is 5 MB.

Featured Examples

Analyze and Compress 1-D Convolutional Neural Network

Analyze 1-D convolutional network for compression and compress it using Taylor pruning and projection.

Since R2024b
Open Live Script

Compress Sequence Classification Network for Road Damage Detection

Compress network to meet memory requirement using pruning, projection, and quantization.

Since R2025a
Open Live Script

Compress Deep Learning Network for Battery State of Charge Estimation

Compress a neural network for predicting the state of charge of a battery using projection.

Since R2024b
Open Live Script