Pruning, Projection, and Quantization
Compress deep neural networks, reduce network memory, and prepare network for
code generation
Use Deep Learning Toolbox™ together with the Deep Learning Toolbox Model Compression Library support package to reduce the memory footprint and computational requirements of a deep neural network:
Prune filters from convolution layers by using first-order Taylor approximation.
Project layers by performing principal component analysis (PCA) on the layer activations.
Quantize the weights, biases, and activations of layers to reduced precision scaled integer data types.
You can then generate code from the compressed network to deploy to your desired hardware.
Highlighted Links
Categories
- Get Started with Network Compression
Learn the basics of the Deep Learning Toolbox Model Compression Library
- Pruning
Prune network filters using first-order Taylor approximation; reduce number of learnable parameters
- Projection
Project network layers using principal component analysis (PCA); reduce number of learnable parameters
- Quantization
Quantize network parameters to reduced-precision data types; prepare deep learning network for fixed-point code generation
- Network Compression Applications
Explore deep learning model compression in end-to-end workflows



