Quantization and Pruning
Compress a deep neural network by performing quantization or pruning
Use Deep Learning Toolbox™ together with the Deep Learning Toolbox Model Quantization Library support package to reduce the memory footprint and computational requirements of a deep neural network by:
Quantizing the weights, biases, and activations of layers to reduced precision scaled integer data types. You can then generate C/C++, CUDA®, or HDL code from this quantized network.
Pruning filters from convolution layers by using first-order Taylor approximation. You can then generate C/C++ or CUDA code from this pruned network.
Functions
Apps
Deep Network Quantizer | Quantize a deep neural network to 8-bit scaled integer data types |
Topics
Deep Learning Quantization
- Quantization of Deep Neural Networks
Understand effects of quantization and how to visualize dynamic ranges of network convolution layers. - Quantization Workflow Prerequisites
Products required for the quantization of deep learning networks. - Emulate Target Agnostic Quantized Network
With MATLAB you can quantize neural networks without generating code or deploying to a specific target.
Quantization for GPU Target
- Emulate Quantized Network Behavior for GPU Target
Examine the behavior of a quantized network deployed for GPU targets without generating code. - Code Generation for Quantized Deep Learning Networks (GPU Coder)
Quantize and generate code for a pretrained convolutional neural network. - Quantize Residual Network Trained for Image Classification and Generate CUDA Code
This example shows how to quantize the learnable parameters in the convolution layers of a deep learning neural network that has residual connections and has been trained for image classification with CIFAR-10 data. - Quantize Object Detectors and Generate CUDA Code
This example shows how to generate CUDA® code for an SSD vehicle detector and a YOLO v2 vehicle detector that performs inference computations in 8-bit integers.
Quantization for FPGA Target
- Quantize Network for FPGA Deployment (Deep Learning HDL Toolbox)
This example shows how to quantize learnable parameters in the convolution layers of a neural network, and validate the quantized network. - Classify Images on an FPGA Using a Quantized DAG Network (Deep Learning HDL Toolbox)
In this example, you use Deep Learning HDL Toolbox™ to deploy a quantized deep convolutional neural network and classify an image. - Classify Images on FPGA by Using Quantized GoogLeNet Network (Deep Learning HDL Toolbox)
This example show how to use the Deep Learning HDL Toolbox™ to deploy a quantized GoogleNet network to classify an image.
Quantization for CPU Target
- Code Generation for Quantized Deep Learning Networks (MATLAB Coder)
Quantize and generate code for a pretrained convolutional neural network. - Code Generation for Quantized Deep Learning Network on Raspberry Pi (MATLAB Coder)
Generate code for deep learning network that performs inference computations in 8-bit integers.
Pruning
- Parameter Pruning and Quantization of Image Classification Network
Use parameter pruning and quantization to reduce network size. - Prune Image Classification Network Using Taylor Scores
This example shows how to reduce the size of a deep neural network using Taylor pruning. - Prune Filters in a Detection Network Using Taylor Scores
This example shows how to reduce network size and increase inference speed by pruning convolutional filters in a you only look once (YOLO) v3 object detection network.