quantize
Description
quantizes a deep neural network using a calibrated quantizedNetwork
= quantize(quantObj
)dlquantizer
object,
quantObj
. The quantized neural network object,
quantizedNetwork
, enables visibility of the quantized layers,
weights, and biases of the network, as well as simulatable quantized inference
behavior.
specifies additional options using one or more name name-value arguments.quantizedNetwork
= quantize(quantObj
,Name,Value
)
This function requires Deep Learning Toolbox Model Quantization Library. To learn about the products required to quantize a deep neural network, see Quantization Workflow Prerequisites.
Examples
Emulate Target Agnostic Quantized Network
This example shows how to create a target agnostic, simulatable quantized deep neural network in MATLAB.
Target agnostic quantization allows you to see the effect quantization has on your neural network without target hardware or target-specific quantization schemes. Creating a target agnostic quantized network is useful if you:
Do not have access to your target hardware.
Want to preview whether or not your network is suitable for quantization.
Want to find layers that are sensitive to quantization.
Quantized networks emulate quantized behavior for quantization-compatible layers. Network architecture like layers and connections are the same as the original network, but inference behavior uses limited precision types. Once you have quantized your network, you can use the quantizationDetails function to retrieve details on what was quantized.
Load the pretrained network. net
is a SqueezeNet network that has been retrained using transfer learning to classify images in the MerchData
data set.
load squeezedlnetmerch
net
net = dlnetwork with properties: Layers: [67×1 nnet.cnn.layer.Layer] Connections: [74×2 table] Learnables: [52×3 table] State: [0×3 table] InputNames: {'data'} OutputNames: {'prob'} Initialized: 1 View summary with summary.
You can use the quantizationDetails
function to see that the network is not quantized.
qDetailsOriginal = quantizationDetails(net)
qDetailsOriginal = struct with fields:
IsQuantized: 0
TargetLibrary: ""
QuantizedLayerNames: [0×0 string]
QuantizedLearnables: [0×3 table]
Unzip and load the MerchData
images as an image datastore and extract the classes from the datastore.
unzip('MerchData.zip') imds = imageDatastore('MerchData', ... 'IncludeSubfolders',true, ... 'LabelSource','foldernames'); classes = categories(imds.Labels);
Define calibration and validation data to use for quantization. The output size of the images are changed for both calibration and validation data according to network requirements.
[calData,valData] = splitEachLabel(imds,0.7,'randomized');
augCalData = augmentedImageDatastore([227 227],calData);
augValData = augmentedImageDatastore([227 227],valData);
Create dlquantizer
object and specify the network to quantize. Set the execution environment to MATLAB. How the network is quantized depends on the execution environment. The MATLAB execution environment is agnostic to the target hardware and allows you to prototype quantized behavior. When you use the MATLAB execution environment, quantization is performed using the fi
fixed-point data type which requires a Fixed-Point Designer™ license.
quantObj = dlquantizer(net,'ExecutionEnvironment','MATLAB');
Use the calibrate
function to exercise the network with sample inputs and collect range information. The calibrate
function exercises the network and collects the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. The function returns a table. Each row of the table contains range information for a learnable parameter of the optimized network.
calResults = calibrate(quantObj,augCalData);
Use the quantize
method to quantize the network object and return a simulatable quantized network.
qNet = quantize(quantObj)
qNet = Quantized dlnetwork with properties: Layers: [67×1 nnet.cnn.layer.Layer] Connections: [74×2 table] Learnables: [52×3 table] State: [0×3 table] InputNames: {'data'} OutputNames: {'prob'} Initialized: 1 View summary with summary. Use the quantizationDetails function to extract quantization details.
You can use the quantizationDetails
function to see that the network is now quantized.
qDetailsQuantized = quantizationDetails(qNet)
qDetailsQuantized = struct with fields:
IsQuantized: 1
TargetLibrary: "none"
QuantizedLayerNames: [53×1 string]
QuantizedLearnables: [52×3 table]
Make predictions using the original, single-precision floating-point network, and the quantized INT8 network.
origScores = minibatchpredict(net,augValData); predOriginal = scores2label(origScores,classes); % Predictions for the non-quantized network qScores = minibatchpredict(qNet,augValData); predQuantized = scores2label(qScores,classes); % Predictions for the quantized network
Compute the relative accuracy of the quantized network as compared to the original network.
ccrQuantized = mean(squeeze(predQuantized) == valData.Labels)*100
ccrQuantized = 100
ccrOriginal = mean(squeeze(predOriginal) == valData.Labels)*100
ccrOriginal = 100
For this validation data set, the quantized network gives the same predictions as the floating-point network.
Emulate GPU Target Behavior for Quantized Network
This example shows how to emulate the behavior of a quantized network for GPU deployment. Once you quantize your network for a GPU execution environment, you can emulate the GPU target behavior without the GPU hardware. Doing so allows you to examine your quantized network structure and behavior without generating code for deployment.
Emulated quantized networks are not smaller than the original network.
Load the pretrained network. net
is a SqueezeNet convolutional neural network that has been retrained using transfer learning to classify images in the MerchData
data set.
load squeezedlnetmerch
net
net = dlnetwork with properties: Layers: [67×1 nnet.cnn.layer.Layer] Connections: [74×2 table] Learnables: [52×3 table] State: [0×3 table] InputNames: {'data'} OutputNames: {'prob'} Initialized: 1 View summary with summary.
Define calibration and validation data to use for quantization.
Use the calibration data to collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. For the best quantization results, the calibration data must be representative of inputs to the network.
Use the validation data to test the network after quantization to understand the effects of the limited range and precision of the quantized convolution layers in the network.
For this example, use the images in the MerchData data set. Split the data into calibration and validation data sets.
unzip("MerchData.zip"); imds = imageDatastore("MerchData", ... IncludeSubfolders=true, ... LabelSource="foldernames"); classes = categories(imds.Labels); [calData,valData] = splitEachLabel(imds,0.7,"randomized");
Create a dlquantizer
object and specify the network to quantize. How the network is quantized depends on the execution environment. Set ExecutionEnvironment
to GPU
to perform quantization specific to GPU target hardware.
quantObj = dlquantizer(net,ExecutionEnvironment="GPU");
Use the calibrate
function to exercise the network object with sample inputs and collect range information.
calResults = calibrate(quantObj,calData);
Use the quantize
method to quantize the network object and return a simulatable quantized network.
qNet = quantize(quantObj)
qNet = Quantized dlnetwork with properties: Layers: [67×1 nnet.cnn.layer.Layer] Connections: [74×2 table] Learnables: [52×3 table] State: [0×3 table] InputNames: {'data'} OutputNames: {'prob'} Initialized: 1 View summary with summary. Use the quantizationDetails function to extract quantization details.
You can use the quantizationDetails
method to see that the network is now quantized.
qDetails = quantizationDetails(qNet)
qDetails = struct with fields:
IsQuantized: 1
TargetLibrary: "cudnn"
QuantizedLayerNames: [55×1 string]
QuantizedLearnables: [35×3 table]
The TargetLibrary
field shows that the quantized network emulates the CUDA® Deep Neural Network library (cuDNN).
The QuantizedLayerNames
field displays a list of layers that have been quantized.
qDetails.QuantizedLayerNames(1:5)
ans = 5×1 string
"conv1"
"relu_conv1"
"pool1"
"fire2-squeeze1x1"
"fire2-relu_squeeze1x1"
The QuantizedLearnables
field contains additional details on quantized network learnable parameters. In this example, the 2-D convolutional layer, conv1
, has had the weights scaled and cast to int8. The bias is scaled and remains in single precision. The values of quantized learnables are returned as stored integer values.
qDetails.QuantizedLearnables
ans=35×3 table
Layer Parameter Value
__________________ _________ ___________________
"conv1" "Weights" {3×3×3×64 int8 }
"conv1" "Bias" {1×1×64 single}
"fire2-squeeze1x1" "Weights" {1×1×64×16 int8 }
"fire2-squeeze1x1" "Bias" {1×1×16 single}
"fire2-expand1x1" "Weights" {1×1×16×64 int8 }
"fire2-expand3x3" "Weights" {3×3×16×64 int8 }
"fire3-squeeze1x1" "Weights" {1×1×128×16 int8 }
"fire3-squeeze1x1" "Bias" {1×1×16 single}
"fire3-expand1x1" "Weights" {1×1×16×64 int8 }
"fire3-expand3x3" "Weights" {3×3×16×64 int8 }
"fire4-squeeze1x1" "Weights" {1×1×128×32 int8 }
"fire4-squeeze1x1" "Bias" {1×1×32 single}
"fire4-expand1x1" "Weights" {1×1×32×128 int8 }
"fire4-expand3x3" "Weights" {3×3×32×128 int8 }
"fire5-squeeze1x1" "Weights" {1×1×256×32 int8 }
"fire5-squeeze1x1" "Bias" {1×1×32 single}
⋮
You can use the quantized network to emulate how a network quantized for GPU target hardware would perform a classification task.
Make predictions using the original, single-precision floating-point network. To accelerate the computation by compiling and executing a MEX function on the GPU, use the acceleration
option "mex"
of the predict
function.
XTest = readall(valData); XTest = cat(4,XTest{:}); XTest = dlarray(gpuArray(single(XTest)),"SSCB"); TTest = valData.Labels; YTestOriginal = predict(net,XTest,Acceleration="mex");
Generating MEX for cudnn target.
YTestOriginal = onehotdecode(YTestOriginal,classes,3);
Make predictions using the quantized INT8 network. Use the acceleration
option "mex"
of the predict
function. MEX acceleration is supported for quantized networks based on quantization objects with ExecutionEnvironment
set to GPU
.
YTestQuantized = predict(qNet,XTest,Acceleration="mex");
Generating MEX for cudnn target.
YTestQuantized = onehotdecode(YTestQuantized,classes,3);
Compute the relative accuracy of the quantized network as compared to the original network.
ccrOriginal = mean(squeeze(YTestOriginal) == valData.Labels)
ccrOriginal = 1
ccrQuantized = mean(squeeze(YTestQuantized) == valData.Labels)
ccrQuantized = 1
The quantized network shows no drop in accuracy.
Emulate FPGA Target Behavior for Quantized Network
This example shows how to emulate the behavior of a quantized network for FPGA deployment. Once you quantize your network for an FPGA execution environment, you can emulate the FPGA target behavior without any FPGA hardware. This action allows you to examine your quantized network structure and behavior without generating code for deployment.
Load the pretrained network.
if ~isfile("LogoNet.mat") url = "https://www.mathworks.com/supportfiles/gpucoder/cnn_models/logo_detection/LogoNet.mat"; websave("LogoNet.mat",url); end data = load("LogoNet.mat"); net = data.convnet;
Define calibration and validation data to use for quantization.
Use the calibration data to collect the dynamic ranges of the weights and biases in the convolution and fully connected layers, the dynamic ranges of the activations in all the layers, and the dynamic ranges of the parameters for some layers. For the best quantization results, the calibration data must be representative of inputs to the network.
Use the validation data to test the network after quantization. Test the network to determine the effects of the limited range and precision of the quantized layers and layer parameters in the network.
This example uses the images in the logos_dataset
data set. Create an imageDatastore
object, then split the data into calibration and validation data sets.
unzip("logos_dataset.zip"); imageData = imageDatastore(fullfile(pwd,"logos_dataset"),... IncludeSubfolders=true,FileExtensions=".JPG",LabelSource="foldernames"); [calData,valData] = splitEachLabel(imageData,0.7,"randomized");
Create a dlquantizer
object and specify the network to quantize. Set the execution environment for the quantized network to FPGA
.
quantObj = dlquantizer(net,ExecutionEnvironment="FPGA");
Use the calibrate
function to exercise the network with sample inputs and collect range information.
calResults = calibrate(quantObj,calData,UseGPU="off");
Use the quantize
function to quantize the network object and return a quantized network for simulation.
qNet = quantize(quantObj)
qNet = Quantized DAGNetwork with properties: Layers: [22x1 nnet.cnn.layer.Layer] Connections: [21x2 table] InputNames: {'imageinput'} OutputNames: {'classoutput'} Use the quantizationDetails function to extract quantization details.
Use the quantizationDetails
method to extract quantization details.
You can use the quantizationDetails
function to confirm that the network is now quantized. The TargetLibrary
field shows that the quantized network emulates an FPGA target.
qDetails = quantizationDetails(qNet)
qDetails = struct with fields:
IsQuantized: 1
TargetLibrary: "fpga"
QuantizedLayerNames: [17x1 string]
QuantizedLearnables: [14x3 table]
The QuantizedLayerNames
field displays a list of quantized layers.
qDetails.QuantizedLayerNames
ans = 17x1 string
"conv_1"
"relu_1"
"maxpool_1"
"conv_2"
"relu_2"
"maxpool_2"
"conv_3"
"relu_3"
"maxpool_3"
"conv_4"
"relu_4"
"maxpool_4"
"fc_1"
"relu_5"
"fc_2"
"relu_6"
"fc_3"
The QuantizedLearnables
field contains additional details about the quantized network learnable parameters. In this example, the 2-D convolutional layers and fully connected layers have their weights scaled and cast to int8
. The bias is scaled and remains in int32
. The quantizationDetails
function returns the values of the quantized learnables as stored integer values.
qDetails.QuantizedLearnables
ans=14×3 table
Layer Parameter Value
________ _________ _____________________
"conv_1" "Weights" {5x5x3x96 int8 }
"conv_1" "Bias" {1x1x96 int32}
"conv_2" "Weights" {3x3x96x128 int8 }
"conv_2" "Bias" {1x1x128 int32}
"conv_3" "Weights" {3x3x128x384 int8 }
"conv_3" "Bias" {1x1x384 int32}
"conv_4" "Weights" {3x3x384x128 int8 }
"conv_4" "Bias" {1x1x128 int32}
"fc_1" "Weights" {5x5x128x2048 int8 }
"fc_1" "Bias" {1x1x2048 int32}
"fc_2" "Weights" {1x1x2048x2048 int8 }
"fc_2" "Bias" {1x1x2048 int32}
"fc_3" "Weights" {1x1x2048x32 int8 }
"fc_3" "Bias" {1x1x32 int32}
You can use the quantized network to emulate a network quantized for FPGA target hardware performing a classification task.
ypred = qNet.classify(valData); ccr = mean(ypred == valData.Labels)
ccr = 1
Input Arguments
quantObj
— Network to quantize
dlquantizer
object
dlquantizer
object containing the network to quantize, calibrated using the calibrate
object function. The ExecutionEnvironment must be set to
'GPU'
'FPGA'
, or 'MATLAB'
.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: quantizedNetwork =
quantize(quantObj,'ExponentScheme','Histogram')
ExponentScheme
— Exponent selection scheme
'MinMax'
(default) | 'Histogram'
Exponent selection scheme, specified as one of these values:
'MinMax'
— Evaluate the exponent based on the range information in the calibration statistics and avoid overflows.'Histogram'
— Distribution-based scaling which evaluates the exponent to best fit the calibration data.
Example: 'ExponentScheme','Histogram'
Output Arguments
quantizedNetwork
— Quantized neural network
dlnetwork
object | DAGNetwork
object | yolov2ObjectDetector
object | yolov3ObjectDetector
object | yolov4ObjectDetector
object | ssdObjectDetector
object
Quantized neural network, returned as a dlnetwork
, DAGNetwork
,
yolov2ObjectDetector
(Computer Vision Toolbox), yolov3ObjectDetector
(Computer Vision Toolbox), yolov4ObjectDetector
(Computer Vision Toolbox), or a ssdObjectDetector
(Computer Vision Toolbox) object.
Limitations
The
quantize
function does not support quantization of networks usingdlquantizer
objects withExecutionEnvironment
set to'CPU'
.Code generation does not support quantized deep neural networks produced by the
quantize
function.
Version History
Introduced in R2022aR2023a: Quantize dlquantizer
objects that specify a dlnetwork
The quantize
function now supports quantization of dlnetwork
objects using a calibration dlquantizer
object.
R2022b: quantize
support for FPGA execution environment
Use the quantize
method to create a simulatable quantized network
when the ExecutionEnvironment
property of dlquantizer
is
set to FPGA
. The simulatable quantized network enables visibility of the
quantized layers, weights, and biases of the network, as well as simulatable quantized
inference behavior.
R2022a: Quantize dlquantizer
objects calibrated in R2022a and later
The quantize
function supports quantization of
dlquantizer
objects that are calibrated in R2022a and later.
See Also
Apps
Functions
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: United States.
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)