validate

Quantize and validate a deep neural network

Syntax

valResults = validate(quantObj,valData)

valResults = validate(quantObj,valData,quantOpts)

Description

valResults = validate(quantObj,valData) quantizes the weights, biases, and activations in the convolution layers of the network, and validates the network specified by dlquantizer object, quantObj, using the data specified by valData.

example

valResults = validate(quantObj,valData,quantOpts) quantizes and validates the network with additional options specified by quantOpts.

This function requires the Deep Learning Toolbox Model Compression Library. To learn about the products required to quantize a deep neural network, see Quantization Workflow Prerequisites.

example

Examples

collapse all

Quantize a Neural Network for GPU Target

This example uses:

Open Live Script

This example shows how to quantize learnable parameters in the convolution layers of a neural network for GPU and explore the behavior of the quantized network. In this example, you quantize the squeezenet neural network after retraining the network to classify new images. In this example, the memory required for the network is reduced approximately 75% through quantization while the accuracy of the network is not affected.

Load the pretrained network. net is the output network of the Train Deep Learning Network to Classify New Images example.

load squeezedlnetmerch
net

net = 
  dlnetwork with properties:

         Layers: [67×1 nnet.cnn.layer.Layer]
    Connections: [74×2 table]
     Learnables: [52×3 table]
          State: [0×3 table]
     InputNames: {'data'}
    OutputNames: {'prob'}
    Initialized: 1

  View summary with summary.

Define calibration and validation data to use for quantization.

The calibration data is used to collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. For the best quantization results, the calibration data must be representative of inputs to the network.

The validation data is used to test the network after quantization to understand the effects of the limited range and precision of the quantized convolution layers in the network.

In this example, use the images in the MerchData data set. Define an augmentedImageDatastore object to resize the data for the network. Then, split the data into calibration and validation data sets.

unzip('MerchData.zip');
imds = imageDatastore('MerchData', ...
    'IncludeSubfolders',true, ...
    'LabelSource','foldernames');
classes = categories(imds.Labels);
[calData, valData] = splitEachLabel(imds, 0.7, 'randomized');
aug_calData = augmentedImageDatastore([227 227], calData);
aug_valData = augmentedImageDatastore([227 227], valData);

Create a dlquantizer object and specify the network to quantize.

dlquantObj = dlquantizer(net);

Specify the GPU target.

quantOpts = dlquantizationOptions(Target='gpu');
quantOpts.MetricFcn = {@(x)hAccuracy(x,net,aug_valData,classes)}

quantOpts = 
  dlquantizationOptions with properties:

   Validation Metric Info
    MetricFcn: {[@(x)hAccuracy(x,net,aug_valData,classes)]}

   Validation Environment Info
       Target: 'gpu'
    Bitstream: ''

Use the calibrate function to exercise the network with sample inputs and collect range information. The calibrate function exercises the network and collects the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. The function returns a table. Each row of the table contains range information for a learnable parameter of the optimized network.

calResults = calibrate(dlquantObj, aug_calData)

calResults=120×5 table
        Optimized Layer Name         Network Layer Name     Learnables / Activations    MinValue     MaxValue
    ____________________________    ____________________    ________________________    _________    ________

    {'conv1_Weights'           }    {'conv1'           }           "Weights"             -0.91985     0.88489
    {'conv1_Bias'              }    {'conv1'           }           "Bias"                -0.07925     0.26343
    {'fire2-squeeze1x1_Weights'}    {'fire2-squeeze1x1'}           "Weights"                -1.38      1.2477
    {'fire2-squeeze1x1_Bias'   }    {'fire2-squeeze1x1'}           "Bias"                -0.11641     0.24273
    {'fire2-expand1x1_Weights' }    {'fire2-expand1x1' }           "Weights"              -0.7406     0.90982
    {'fire2-expand1x1_Bias'    }    {'fire2-expand1x1' }           "Bias"               -0.060056     0.14602
    {'fire2-expand3x3_Weights' }    {'fire2-expand3x3' }           "Weights"             -0.74397     0.66905
    {'fire2-expand3x3_Bias'    }    {'fire2-expand3x3' }           "Bias"               -0.051778    0.074239
    {'fire3-squeeze1x1_Weights'}    {'fire3-squeeze1x1'}           "Weights"              -0.7712     0.68917
    {'fire3-squeeze1x1_Bias'   }    {'fire3-squeeze1x1'}           "Bias"                -0.10138     0.32675
    {'fire3-expand1x1_Weights' }    {'fire3-expand1x1' }           "Weights"             -0.72035      0.9743
    {'fire3-expand1x1_Bias'    }    {'fire3-expand1x1' }           "Bias"               -0.067029     0.30425
    {'fire3-expand3x3_Weights' }    {'fire3-expand3x3' }           "Weights"             -0.61443      0.7741
    {'fire3-expand3x3_Bias'    }    {'fire3-expand3x3' }           "Bias"               -0.053613     0.10329
    {'fire4-squeeze1x1_Weights'}    {'fire4-squeeze1x1'}           "Weights"              -0.7422      1.0877
    {'fire4-squeeze1x1_Bias'   }    {'fire4-squeeze1x1'}           "Bias"                -0.10885     0.13881
      ⋮

Use the validate function to quantize the learnable parameters in the convolution layers of the network and exercise the network. The function uses the metric function defined in the dlquantizationOptions object to compare the results of the network before and after quantization.

valResults = validate(dlquantObj, aug_valData, quantOpts)

valResults = struct with fields:
       NumSamples: 20
    MetricResults: [1×1 struct]
       Statistics: [2×2 table]

Examine the validation output to see the performance of the quantized network.

valResults.MetricResults.Result

ans=2×2 table
    NetworkImplementation    MetricOutput
    _____________________    ____________

     {'Floating-Point'}           1      
     {'Quantized'     }           1

valResults.Statistics

ans=2×2 table
    NetworkImplementation    LearnableParameterMemory(bytes)
    _____________________    _______________________________

     {'Floating-Point'}                2.9003e+06           
     {'Quantized'     }                7.3393e+05

In this example, the memory required for the network was reduced approximately 75% through quantization. The accuracy of the network is not affected.

The weights, biases, and activations of the convolution layers of the network specified in the dlquantizer object now use scaled 8-bit integer data types.

Quantize Network for FPGA Deployment

This example uses:

Open Live Script

Reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types. This example shows how to use Deep Learning Toolbox Model Compression Library and Deep Learning HDL Toolbox to deploy the int8 network to a target FPGA board.

Load Pretrained Network

Load the pretrained LogoNet network and analyze the network architecture.

snet = getLogoNetwork;
deepNetworkDesigner(snet);

Set random number generator for reproducibility.

rng(0);

Load Data

This example uses the logos_dataset data set. The data set consists of 320 images. Each image is 227-by-227 in size and has three color channels (RGB). Create an augmentedImageDatastore object for calibration and validation.

curDir = pwd;
unzip("logos_dataset.zip");
imageData = imageDatastore(fullfile(curDir,'logos_dataset'),...
'IncludeSubfolders',true,'FileExtensions','.JPG','LabelSource','foldernames');
[calibrationData, validationData] = splitEachLabel(imageData, 0.5,'randomized');

Generate Calibration Result File for the Network

Create a dlquantizer (Deep Learning HDL Toolbox) object and specify the network to quantize. Specify the execution environment as FPGA.

dlQuantObj = dlquantizer(snet,'ExecutionEnvironment',"FPGA");

Use the calibrate (Deep Learning HDL Toolbox) function to exercise the network with sample inputs and collect the range information. The calibrate function collects the dynamic ranges of the weights and biases. The calibrate function returns a table. Each row of the table contains range information for a learnable parameter of the quantized network.

calibrate(dlQuantObj,calibrationData)

ans=35×5 table
        Optimized Layer Name        Network Layer Name    Learnables / Activations     MinValue       MaxValue 
    ____________________________    __________________    ________________________    ___________    __________

    {'conv_1_Weights'          }      {'conv_1'    }           "Weights"                -0.048978      0.039352
    {'conv_1_Bias'             }      {'conv_1'    }           "Bias"                     0.99996        1.0028
    {'conv_2_Weights'          }      {'conv_2'    }           "Weights"                -0.055518      0.061901
    {'conv_2_Bias'             }      {'conv_2'    }           "Bias"                 -0.00061171       0.00227
    {'conv_3_Weights'          }      {'conv_3'    }           "Weights"                -0.045942      0.046927
    {'conv_3_Bias'             }      {'conv_3'    }           "Bias"                  -0.0013998     0.0015218
    {'conv_4_Weights'          }      {'conv_4'    }           "Weights"                -0.045967         0.051
    {'conv_4_Bias'             }      {'conv_4'    }           "Bias"                    -0.00164     0.0037892
    {'fc_1_Weights'            }      {'fc_1'      }           "Weights"                -0.051394      0.054344
    {'fc_1_Bias'               }      {'fc_1'      }           "Bias"                 -0.00052319    0.00084454
    {'fc_2_Weights'            }      {'fc_2'      }           "Weights"                 -0.05016      0.051557
    {'fc_2_Bias'               }      {'fc_2'      }           "Bias"                  -0.0017564     0.0018502
    {'fc_3_Weights'            }      {'fc_3'      }           "Weights"                -0.050706       0.04678
    {'fc_3_Bias'               }      {'fc_3'      }           "Bias"                    -0.02951      0.024855
    {'imageinput'              }      {'imageinput'}           "Activations"                    0           255
    {'imageinput_normalization'}      {'imageinput'}           "Activations"              -139.34        198.72
      ⋮

Create Target Object

Create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. Interface options are JTAG and Ethernet. To use JTAG, install Xilinx Vivado® Design Suite 2022.1. To set the Xilinx Vivado toolpath, enter:

hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2022.1\bin\vivado.bat');

To create the target object, enter:

hTarget = dlhdl.Target('Xilinx','Interface','Ethernet','IPAddress','10.10.10.15');

Alternatively, you can also use the JTAG interface.

% hTarget = dlhdl.Target('Xilinx', 'Interface', 'JTAG');

Create dlQuantizationOptions Object

Create a dlquantizationOptions object. Specify the target bitstream and target board interface. The default metric function is a Top-1 accuracy metric function.

options_FPGA = dlquantizationOptions('Bitstream','zcu102_int8','Target',hTarget);
options_emulation = dlquantizationOptions('Target','host');

To use a custom metric function, specify the metric function in the dlquantizationOptions object.

options_FPGA = dlquantizationOptions('MetricFcn',{@(x)hComputeAccuracy(x,snet,validationData)},'Bitstream','zcu102_int8','Target',hTarget);
options_emulation = dlquantizationOptions('MetricFcn',{@(x)hComputeAccuracy(x,snet,validationData)})

Validate Quantized Network

Use the validate function to quantize the learnable parameters in the convolution layers of the network. The validate function simulates the quantized network in MATLAB. The validate function uses the metric function defined in the dlquantizationOptions object to compare the results of the single-data-type network object to the results of the quantized network object.

prediction_emulation = dlQuantObj.validate(validationData,options_emulation)

prediction_emulation = struct with fields:
       NumSamples: 160
    MetricResults: [1×1 struct]
       Statistics: []

For validation on an FPGA, the validate function:

Programs the FPGA board by using the output of the compile method and the programming file
Downloads the network weights and biases
Compares the performance of the network before and after quantization

prediction_FPGA = dlQuantObj.validate(validationData,options_FPGA)

### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream zcu102_int8.
### The network includes the following layers:
1 'imageinput' Image Input 227×227×3 images with 'zerocenter' normalization and 'randfliplr' augmentations (SW Layer)
2 'conv_1' 2-D Convolution 96 5×5×3 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer)
3 'relu_1' ReLU ReLU (HW Layer)
4 'maxpool_1' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
5 'conv_2' 2-D Convolution 128 3×3×96 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer)
6 'relu_2' ReLU ReLU (HW Layer)
7 'maxpool_2' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
8 'conv_3' 2-D Convolution 384 3×3×128 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer)
9 'relu_3' ReLU ReLU (HW Layer)
10 'maxpool_3' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
11 'conv_4' 2-D Convolution 128 3×3×384 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer)
12 'relu_4' ReLU ReLU (HW Layer)
13 'maxpool_4' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
14 'fc_1' Fully Connected 2048 fully connected layer (HW Layer)
15 'relu_5' ReLU ReLU (HW Layer)
16 'fc_2' Fully Connected 2048 fully connected layer (HW Layer)
17 'relu_6' ReLU ReLU (HW Layer)
18 'fc_3' Fully Connected 32 fully connected layer (HW Layer)
19 'softmax' Softmax softmax (SW Layer)
20 'classoutput' Classification Output crossentropyex with 'adidas' and 31 other classes (SW Layer)

### Notice: The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.
### Compiling layer group: conv_1>>relu_4 ...
### Compiling layer group: conv_1>>relu_4 ... complete.
### Compiling layer group: maxpool_4 ...
### Compiling layer group: maxpool_4 ... complete.
### Compiling layer group: fc_1>>fc_3 ...
### Compiling layer group: fc_1>>fc_3 ... complete.

### Allocating external memory buffers:

offset_name offset_address allocated_space
_______________________ ______________ ________________

"InputDataOffset" "0x00000000" "11.9 MB"
"OutputResultOffset" "0x00be0000" "128.0 kB"
"SchedulerDataOffset" "0x00c00000" "128.0 kB"
"SystemBufferOffset" "0x00c20000" "9.9 MB"
"InstructionDataOffset" "0x01600000" "4.6 MB"
"ConvWeightDataOffset" "0x01aa0000" "8.2 MB"
"FCWeightDataOffset" "0x022e0000" "10.4 MB"
"EndOffset" "0x02d40000" "Total: 45.2 MB"

### Network compilation complete.

### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA.
### Deep learning network programming has been skipped as the same network is already loaded on the target FPGA.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Notice: The layer 'imageinput' of type 'ImageInputLayer' is split into an image input layer 'imageinput' and an addition layer 'imageinput_norm' for normalization on hardware.
### The network includes the following layers:
1 'imageinput' Image Input 227×227×3 images with 'zerocenter' normalization and 'randfliplr' augmentations (SW Layer)
2 'conv_1' 2-D Convolution 96 5×5×3 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer)
3 'relu_1' ReLU ReLU (HW Layer)
4 'maxpool_1' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
5 'conv_2' 2-D Convolution 128 3×3×96 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer)
6 'relu_2' ReLU ReLU (HW Layer)
7 'maxpool_2' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
8 'conv_3' 2-D Convolution 384 3×3×128 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer)
9 'relu_3' ReLU ReLU (HW Layer)
10 'maxpool_3' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
11 'conv_4' 2-D Convolution 128 3×3×384 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer)
12 'relu_4' ReLU ReLU (HW Layer)
13 'maxpool_4' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
14 'fc_1' Fully Connected 2048 fully connected layer (HW Layer)
15 'relu_5' ReLU ReLU (HW Layer)
16 'fc_2' Fully Connected 2048 fully connected layer (HW Layer)
17 'relu_6' ReLU ReLU (HW Layer)
18 'fc_3' Fully Connected 32 fully connected layer (HW Layer)
19 'softmax' Softmax softmax (SW Layer)
20 'classoutput' Classification Output crossentropyex with 'adidas' and 31 other classes (SW Layer)

### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.

Deep Learning Processor Estimator Performance Results

LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s
------------- ------------- --------- --------- ---------
Network 39136574 0.17789 1 39136574 5.6
imageinput_norm 216472 0.00098
conv_1 6832680 0.03106
maxpool_1 3705912 0.01685
conv_2 10454501 0.04752
maxpool_2 1173810 0.00534
conv_3 9364533 0.04257
maxpool_3 1229970 0.00559
conv_4 1759348 0.00800
maxpool_4 24450 0.00011
fc_1 2651288 0.01205
fc_2 1696632 0.00771
fc_3 26978 0.00012
* The clock frequency of the DL processor is: 220MHz

### Finished writing input activations.
### Running single input activation.

prediction_FPGA = struct with fields:
       NumSamples: 160
    MetricResults: [1×1 struct]
       Statistics: [2×7 table]

View Performance of Quantized Neural Network

Display the accuracy of the quantized network.

prediction_emulation.MetricResults.Result

ans=2×2 table
    NetworkImplementation    MetricOutput
    _____________________    ____________

     {'Floating-Point'}         0.9875   
     {'Quantized'     }         0.9875

prediction_FPGA.MetricResults.Result

ans=2×2 table
    NetworkImplementation    MetricOutput
    _____________________    ____________

     {'Floating-Point'}         0.9875   
     {'Quantized'     }         0.9875

Display the performance of the quantized network in frames per second.

prediction_FPGA.Statistics

ans=2×7 table
    NetworkImplementation    FramesPerSecond    Number of Threads (Convolution)    Number of Threads (Fully Connected)    LUT Utilization (%)    BlockRAM Utilization (%)    DSP Utilization (%)
    _____________________    _______________    _______________________________    ___________________________________    ___________________    ________________________    ___________________

     {'Floating-Point'}          5.6213                       16                                    4                           93.198                    63.925                   15.595       
     {'Quantized'     }          19.433                       64                                   16                            62.31                     50.11                   32.103

Quantize a Neural Network for CPU Target

This example uses:

Open Live Script

This example shows how to quantize and validate a neural network for a CPU target. This workflow is similar to other execution environments, but before validating you must establish a raspi connection and specify it as target using dlquantizationOptions.

First, load your network. This example uses the pretrained network squeezenet.

load squeezedlnetmerch
net

net = 
  dlnetwork with properties:

         Layers: [67×1 nnet.cnn.layer.Layer]
    Connections: [74×2 table]
     Learnables: [52×3 table]
          State: [0×3 table]
     InputNames: {'data'}
    OutputNames: {'prob'}
    Initialized: 1

  View summary with summary.

Then define your calibration and validation data, calDS and valDS respectively.

unzip('MerchData.zip');
imds = imageDatastore('MerchData', ...
    'IncludeSubfolders',true, ...
    'LabelSource','foldernames');
classes = categories(imds.Labels);
[calData, valData] = splitEachLabel(imds, 0.7, 'randomized');
aug_calData = augmentedImageDatastore([227 227],calData);
aug_valData = augmentedImageDatastore([227 227],valData);

Create the dlquantizer object and specify a CPU execution environment.

dq =  dlquantizer(net,'ExecutionEnvironment','CPU')

dq = 
  dlquantizer with properties:

           NetworkObject: [1×1 dlnetwork]
    ExecutionEnvironment: 'CPU'

Calibrate the network.

calResults = calibrate(dq,aug_calData,'UseGPU','off')

calResults=120×5 table
       Optimized Layer Name        Network Layer Name     Learnables / Activations    MinValue     MaxValue
    __________________________    ____________________    ________________________    _________    ________

    "conv1_Weights"               {'conv1'           }           "Weights"             -0.91985     0.88489
    "conv1_Bias"                  {'conv1'           }           "Bias"                -0.07925     0.26343
    "fire2-squeeze1x1_Weights"    {'fire2-squeeze1x1'}           "Weights"                -1.38      1.2477
    "fire2-squeeze1x1_Bias"       {'fire2-squeeze1x1'}           "Bias"                -0.11641     0.24273
    "fire2-expand1x1_Weights"     {'fire2-expand1x1' }           "Weights"              -0.7406     0.90982
    "fire2-expand1x1_Bias"        {'fire2-expand1x1' }           "Bias"               -0.060056     0.14602
    "fire2-expand3x3_Weights"     {'fire2-expand3x3' }           "Weights"             -0.74397     0.66905
    "fire2-expand3x3_Bias"        {'fire2-expand3x3' }           "Bias"               -0.051778    0.074239
    "fire3-squeeze1x1_Weights"    {'fire3-squeeze1x1'}           "Weights"              -0.7712     0.68917
    "fire3-squeeze1x1_Bias"       {'fire3-squeeze1x1'}           "Bias"                -0.10138     0.32675
    "fire3-expand1x1_Weights"     {'fire3-expand1x1' }           "Weights"             -0.72035      0.9743
    "fire3-expand1x1_Bias"        {'fire3-expand1x1' }           "Bias"               -0.067029     0.30425
    "fire3-expand3x3_Weights"     {'fire3-expand3x3' }           "Weights"             -0.61443      0.7741
    "fire3-expand3x3_Bias"        {'fire3-expand3x3' }           "Bias"               -0.053613     0.10329
    "fire4-squeeze1x1_Weights"    {'fire4-squeeze1x1'}           "Weights"              -0.7422      1.0877
    "fire4-squeeze1x1_Bias"       {'fire4-squeeze1x1'}           "Bias"                -0.10885     0.13881
      ⋮

Use the MATLAB Support Package for Raspberry Pi Hardware function, raspi, to create a connection to the Raspberry Pi. In the following code, replace:

raspiname with the name or address of your Raspberry Pi
username with your user name
password with your password

% r = raspi('raspiname','username','password')

For example,

r = raspi('gpucoder-raspberrypi-8','pi','matlab')

r = 
  raspi with properties:

         DeviceAddress: 'gpucoder-raspberrypi-8'      
                  Port: 18734                         
             BoardName: 'Raspberry Pi 3 Model B+'     
         AvailableLEDs: {'led0'}                      
  AvailableDigitalPins: [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27]
  AvailableSPIChannels: {}                            
     AvailableI2CBuses: {}                            
      AvailableWebcams: {}                            
           I2CBusSpeed:                               
AvailableCANInterfaces: {}                            

  Supported peripherals

Specify raspi object as the target for the quantized network.

opts = dlquantizationOptions('Target',r);
opts.MetricFcn = {@(x)hAccuracy(x,net,aug_valData,classes)}

opts = 
  dlquantizationOptions with properties:

   Validation Metric Info
    MetricFcn: {[@(x)hAccuracy(x,net,aug_valData,classes)]}

   Validation Environment Info
       Target: [1×1 raspi]
    Bitstream: ''

Validate the quantized network with the validate function.

valResults = validate(dq,aug_valData,opts)

### Starting application: 'codegen/lib/validate_predict_int8/pil/validate_predict_int8.elf'
    To terminate execution: clear validate_predict_int8_pil
### Launching application validate_predict_int8.elf...
### Host application produced the following standard output (stdout) and standard error (stderr) messages:

valResults = struct with fields:
       NumSamples: 20
    MetricResults: [1×1 struct]
       Statistics: []

Examine the validation output to see the performance of the quantized network.

valResults.MetricResults.Result

ans=2×2 table
    NetworkImplementation    MetricOutput
    _____________________    ____________

     {'Floating-Point'}           1      
     {'Quantized'     }           1

Quantize YOLO v3 Object Detector

This example uses:

Open Live Script

This example shows how to quantize a yolov3ObjectDetector (Computer Vision Toolbox) object using preprocessed calibration and validation data.

First, download a pretrained YOLO v3 object detector.

detector = downloadPretrainedNetwork();

This example uses a small labeled data set that contains one or two labeled instances of a vehicle. Many of these images come from the Caltech Cars 1999 and 2001 data sets, created by Pietro Perona and used with permission.

Unzip the vehicle images and load the vehicle ground truth data.

unzip vehicleDatasetImages.zip
data = load('vehicleDatasetGroundTruth.mat');
vehicleDataset = data.vehicleDataset;

Add the full path to the local vehicle data folder.

vehicleDataset.imageFilename = fullfile(pwd, vehicleDataset.imageFilename);

Create an imageDatastore for loading the images and a boxLabelDatastore (Computer Vision Toolbox) for the ground truth bounding boxes.

imds = imageDatastore(vehicleDataset.imageFilename);
blds = boxLabelDatastore(vehicleDataset(:,2));

Use the combine function to combine both the datastores into a CombinedDatastore.

combinedDS = combine(imds, blds);

Split the data into calibration and validation data.

calData = combinedDS.subset(1:32);
valData = combinedDS.subset(33:64);

Use the preprocess (Computer Vision Toolbox) method of yolov3ObjectDetector (Computer Vision Toolbox) object with transform function to prepare the data for calibration and validation.

The transform function returns a TransformedDatastore object.

processedCalData = transform(calData, @(data)preprocess(detector,data));
processedValData = transform(valData, @(data)preprocess(detector,data));

Create the dlquantizer object. When you use the MATLAB execution environment, quantization is performed using the fi fixed-point data type which requires a Fixed-Point Designer™ license.

dq = dlquantizer(detector, 'ExecutionEnvironment', 'MATLAB');

Calibrate the network.

calResults = calibrate(dq, processedCalData,'UseGPU','off')

calResults=135×5 table
        Optimized Layer Name         Network Layer Name     Learnables / Activations    MinValue     MaxValue
    ____________________________    ____________________    ________________________    _________    ________

    {'conv1_Weights'           }    {'conv1'           }           "Weights"             -0.92189     0.85687
    {'conv1_Bias'              }    {'conv1'           }           "Bias"               -0.096271     0.26628
    {'fire2-squeeze1x1_Weights'}    {'fire2-squeeze1x1'}           "Weights"              -1.3751      1.2444
    {'fire2-squeeze1x1_Bias'   }    {'fire2-squeeze1x1'}           "Bias"                -0.12068     0.23104
    {'fire2-expand1x1_Weights' }    {'fire2-expand1x1' }           "Weights"             -0.75275     0.91615
    {'fire2-expand1x1_Bias'    }    {'fire2-expand1x1' }           "Bias"               -0.059252     0.14035
    {'fire2-expand3x3_Weights' }    {'fire2-expand3x3' }           "Weights"             -0.75271      0.6774
    {'fire2-expand3x3_Bias'    }    {'fire2-expand3x3' }           "Bias"               -0.062214    0.088242
    {'fire3-squeeze1x1_Weights'}    {'fire3-squeeze1x1'}           "Weights"              -0.7586     0.68772
    {'fire3-squeeze1x1_Bias'   }    {'fire3-squeeze1x1'}           "Bias"                -0.10206     0.31645
    {'fire3-expand1x1_Weights' }    {'fire3-expand1x1' }           "Weights"             -0.71566     0.97678
    {'fire3-expand1x1_Bias'    }    {'fire3-expand1x1' }           "Bias"               -0.069313     0.32881
    {'fire3-expand3x3_Weights' }    {'fire3-expand3x3' }           "Weights"             -0.60079     0.77642
    {'fire3-expand3x3_Bias'    }    {'fire3-expand3x3' }           "Bias"               -0.058045     0.11229
    {'fire4-squeeze1x1_Weights'}    {'fire4-squeeze1x1'}           "Weights"               -0.738      1.0805
    {'fire4-squeeze1x1_Bias'   }    {'fire4-squeeze1x1'}           "Bias"                -0.11189     0.13698
      ⋮

Validate the quantized network with the validate function.

valResults = validate(dq, processedValData)

valResults = struct with fields:
       NumSamples: 32
    MetricResults: [1×1 struct]
       Statistics: []

function detector = downloadPretrainedNetwork()
   pretrainedURL = 'https://ssd.mathworks.com/supportfiles/vision/data/yolov3SqueezeNetVehicleExample_21aSPKG.zip';
   websave('yolov3SqueezeNetVehicleExample_21aSPKG.zip', pretrainedURL);

   unzip('yolov3SqueezeNetVehicleExample_21aSPKG.zip');

   pretrained = load("yolov3SqueezeNetVehicleExample_21aSPKG.mat");
   detector = pretrained.detector;
end

Validate Quantized Network on FPGA Using Custom Bitstream

This example uses:

Open Live Script

Validate a dlquantizer object on a target FPGA board using a custom bitstream, and compare the results of validation using two custom int8 bitstreams with different thread counts. In this example, you will quantize a pretrained network, generate custom bitstreams, and validate the quantized network using the custom bitstreams.

Quantize Pretrained Network

Load the pretrained digits network.

snet = getDigitsNetwork;

Load image data for quantization and create calibration and validation datastores. For more information on the data used in this example, see Data Sets for Deep Learning.

dataFolder = fullfile(toolboxdir('nnet'),'nndemos','nndatasets','DigitDataset');
imds = imageDatastore(dataFolder, 'IncludeSubfolders',true,'LabelSource','foldernames');

[calData,valData] = splitEachLabel(imds,0.7,'randomized');
calData_subset = calData.subset(1:20);
valData_subset = valData.subset(1:6);

Quantize the network using a dlquantizer object. Specify FPGA as the execution environment.

dq = dlquantizer(snet,'ExecutionEnvironment','FPGA');
dq.calibrate(calData_subset);

To validate the network on a target FPGA board, specify a dlhdl.Target object. This example uses a Xilinx™ ZCU102 ZU9EG device.

hTarget = dlhdl.Target('Xilinx','Interface','JTAG');

Generate Custom Bitstreams

To compare the performance of custom bitstreams, generate two bitstreams with different configurations. The bitstreams used in this example are customized to show the performance and resource utilization difference between int8 bitstreams with different processor thread counts for the convolution and fully connected modules on the Xilinx™ ZCU102 ZU9EG device.

Generating a bitstream can take several hours. Before generating a bitstream, you can use the optimizeConfigurationForNetwork (Deep Learning HDL Toolbox) method to modify the processor configuration to meet the requirements of your network and target device. For a list of existing bitstreams, see Use Deep Learning on FPGA Bitstreams (Deep Learning HDL Toolbox).

Use a dlhdl.ProcessorConfig object to specify the processor parameters for your custom bitstream. For a quantized network, specify the processor data type as 'int8'. For an int8 processor, the default values assigned to ConvThreadNumber and FCThreadNumber are 16 and 4, respectively. Generate the bitstream using the dlhdl.buildProcessor function. For more information about how to generate a custom bitstream, see Generate Custom Bitstream (Deep Learning HDL Toolbox).

hPCNew = dlhdl.ProcessorConfig
hPCNew.ProcessorDataType = 'int8';
dlhdl.buildProcessor(hPCNew);

Save the generated bitstream as 'custom_int8.bit'. After saving the generated bitstream, use the same dlhdl.ProcessorConfig object to generate a second bitstream. Increase the ConvThreadNumber to 64 and FCThreadNumber to 16.

hPCNew.setModuleProperty('conv','ConvThreadNumber',64);
hPCNew.setModuleProperty('fc','FCThreadNumber',16);
dlhdl.buildProcessor(hPCNew);

Save the new generated bitstream as 'custom_int8_incThread.bit'.

Validate Using Generated Bitstreams

Validate the quantized network on the target device using the first generated bitstream, 'custom_int8.bit'. Specify the bitstream to use for validation using a dlquantizationOptions object. If the bitstream is not in your working directory, specify the full path to the file.

dlquantOpts_custom_int8 = dlquantizationOptions('Bitstream','custom_int8.bit','Target',hTarget);
valResults_custom_int8 = dq.validate(valData_subset,dlquantOpts_custom_int8);

### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream custom_int8.bit.
### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
### The network includes the following layers:
     1   'imageinput'    Image Input             28×28×1 images with 'zerocenter' normalization                (SW Layer)
     2   'conv_1'        2-D Convolution         8 3×3×1 convolutions with stride [1  1] and padding 'same'    (HW Layer)
     3   'relu_1'        ReLU                    ReLU                                                          (HW Layer)
     4   'maxpool_1'     2-D Max Pooling         2×2 max pooling with stride [2  2] and padding [0  0  0  0]   (HW Layer)
     5   'conv_2'        2-D Convolution         16 3×3×8 convolutions with stride [1  1] and padding 'same'   (HW Layer)
     6   'relu_2'        ReLU                    ReLU                                                          (HW Layer)
     7   'maxpool_2'     2-D Max Pooling         2×2 max pooling with stride [2  2] and padding [0  0  0  0]   (HW Layer)
     8   'conv_3'        2-D Convolution         32 3×3×16 convolutions with stride [1  1] and padding 'same'  (HW Layer)
     9   'relu_3'        ReLU                    ReLU                                                          (HW Layer)
    10   'fc'            Fully Connected         10 fully connected layer                                      (HW Layer)
    11   'softmax'       Softmax                 softmax                                                       (SW Layer)
    12   'classoutput'   Classification Output   crossentropyex with '0' and 9 other classes                   (SW Layer)
                                                                                                             
### Notice: The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.
### Compiling layer group: conv_1>>maxpool_2 ...
### Compiling layer group: conv_1>>maxpool_2 ... complete.
### Compiling layer group: conv_3>>relu_3 ...
### Compiling layer group: conv_3>>relu_3 ... complete.
### Compiling layer group: fc ...
### Compiling layer group: fc ... complete.

### Allocating external memory buffers:

          offset_name          offset_address     allocated_space 
    _______________________    ______________    _________________

    "InputDataOffset"           "0x00000000"     "184.0 kB"       
    "OutputResultOffset"        "0x0002e000"     "4.0 kB"         
    "SchedulerDataOffset"       "0x0002f000"     "8.0 kB"         
    "SystemBufferOffset"        "0x00031000"     "36.0 kB"        
    "InstructionDataOffset"     "0x0003a000"     "16.0 kB"        
    "ConvWeightDataOffset"      "0x0003e000"     "8.0 kB"         
    "FCWeightDataOffset"        "0x00040000"     "28.0 kB"        
    "EndOffset"                 "0x00047000"     "Total: 284.0 kB"

### Network compilation complete.

### Programming FPGA Bitstream using JTAG...
### Programming the FPGA bitstream has been completed successfully.
### Loading weights to Conv Processor.
### Conv Weights loaded. Current time is 16-Jan-2024 15:08:59
### Loading weights to FC Processor.
### FC Weights loaded. Current time is 16-Jan-2024 15:08:59
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
### Notice: The layer 'imageinput' of type 'ImageInputLayer' is split into an image input layer 'imageinput' and an addition layer 'imageinput_norm' for normalization on hardware.
### The network includes the following layers:
     1   'imageinput'    Image Input             28×28×1 images with 'zerocenter' normalization                (SW Layer)
     2   'conv_1'        2-D Convolution         8 3×3×1 convolutions with stride [1  1] and padding 'same'    (HW Layer)
     3   'relu_1'        ReLU                    ReLU                                                          (HW Layer)
     4   'maxpool_1'     2-D Max Pooling         2×2 max pooling with stride [2  2] and padding [0  0  0  0]   (HW Layer)
     5   'conv_2'        2-D Convolution         16 3×3×8 convolutions with stride [1  1] and padding 'same'   (HW Layer)
     6   'relu_2'        ReLU                    ReLU                                                          (HW Layer)
     7   'maxpool_2'     2-D Max Pooling         2×2 max pooling with stride [2  2] and padding [0  0  0  0]   (HW Layer)
     8   'conv_3'        2-D Convolution         32 3×3×16 convolutions with stride [1  1] and padding 'same'  (HW Layer)
     9   'relu_3'        ReLU                    ReLU                                                          (HW Layer)
    10   'fc'            Fully Connected         10 fully connected layer                                      (HW Layer)
    11   'softmax'       Softmax                 softmax                                                       (SW Layer)
    12   'classoutput'   Classification Output   crossentropyex with '0' and 9 other classes                   (SW Layer)
                                                                                                             
### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.


              Deep Learning Processor Estimator Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                      22272                  0.00011                       1              2e+04           8979.7
    imageinput_norm           4236                  0.00002 
    conv_1                    4494                  0.00002 
    maxpool_1                 2999                  0.00001 
    conv_2                    2455                  0.00001 
    maxpool_2                 2388                  0.00001 
    conv_3                    2354                  0.00001 
    fc                        3346                  0.00002 
 * The clock frequency of the DL processor is: 200MHz


### Finished writing input activations.
### Running single input activation.

Validate the quantized network on the target device using the second generated bitstream, 'custom_int8_incThread.bit'.

dlquantOpts_custom_incThread = dlquantizationOptions('Bitstream','custom_int8_incThread.bit','Target',hTarget);
valResults_custom_incThread = dq.validate(valData_subset,dlquantOpts_custom_incThread);

### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream custom_int8_incThread.bit.
### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
### The network includes the following layers:
     1   'imageinput'    Image Input             28×28×1 images with 'zerocenter' normalization                (SW Layer)
     2   'conv_1'        2-D Convolution         8 3×3×1 convolutions with stride [1  1] and padding 'same'    (HW Layer)
     3   'relu_1'        ReLU                    ReLU                                                          (HW Layer)
     4   'maxpool_1'     2-D Max Pooling         2×2 max pooling with stride [2  2] and padding [0  0  0  0]   (HW Layer)
     5   'conv_2'        2-D Convolution         16 3×3×8 convolutions with stride [1  1] and padding 'same'   (HW Layer)
     6   'relu_2'        ReLU                    ReLU                                                          (HW Layer)
     7   'maxpool_2'     2-D Max Pooling         2×2 max pooling with stride [2  2] and padding [0  0  0  0]   (HW Layer)
     8   'conv_3'        2-D Convolution         32 3×3×16 convolutions with stride [1  1] and padding 'same'  (HW Layer)
     9   'relu_3'        ReLU                    ReLU                                                          (HW Layer)
    10   'fc'            Fully Connected         10 fully connected layer                                      (HW Layer)
    11   'softmax'       Softmax                 softmax                                                       (SW Layer)
    12   'classoutput'   Classification Output   crossentropyex with '0' and 9 other classes                   (SW Layer)
                                                                                                             
### Notice: The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.
### Compiling layer group: conv_1>>maxpool_2 ...
### Compiling layer group: conv_1>>maxpool_2 ... complete.
### Compiling layer group: conv_3>>relu_3 ...
### Compiling layer group: conv_3>>relu_3 ... complete.
### Compiling layer group: fc ...
### Compiling layer group: fc ... complete.

### Allocating external memory buffers:

          offset_name          offset_address     allocated_space 
    _______________________    ______________    _________________

    "InputDataOffset"           "0x00000000"     "92.0 kB"        
    "OutputResultOffset"        "0x00017000"     "4.0 kB"         
    "SchedulerDataOffset"       "0x00018000"     "36.0 kB"        
    "SystemBufferOffset"        "0x00021000"     "36.0 kB"        
    "InstructionDataOffset"     "0x0002a000"     "28.0 kB"        
    "ConvWeightDataOffset"      "0x00031000"     "8.0 kB"         
    "FCWeightDataOffset"        "0x00033000"     "20.0 kB"        
    "EndOffset"                 "0x00038000"     "Total: 224.0 kB"

### Network compilation complete.

### Programming FPGA Bitstream using JTAG...
### Programming the FPGA bitstream has been completed successfully.
### Loading weights to Conv Processor.
### Conv Weights loaded. Current time is 16-Jan-2024 15:10:57
### Loading weights to FC Processor.
### FC Weights loaded. Current time is 16-Jan-2024 15:10:57
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
### Notice: The layer 'imageinput' of type 'ImageInputLayer' is split into an image input layer 'imageinput' and an addition layer 'imageinput_norm' for normalization on hardware.
### The network includes the following layers:
     1   'imageinput'    Image Input             28×28×1 images with 'zerocenter' normalization                (SW Layer)
     2   'conv_1'        2-D Convolution         8 3×3×1 convolutions with stride [1  1] and padding 'same'    (HW Layer)
     3   'relu_1'        ReLU                    ReLU                                                          (HW Layer)
     4   'maxpool_1'     2-D Max Pooling         2×2 max pooling with stride [2  2] and padding [0  0  0  0]   (HW Layer)
     5   'conv_2'        2-D Convolution         16 3×3×8 convolutions with stride [1  1] and padding 'same'   (HW Layer)
     6   'relu_2'        ReLU                    ReLU                                                          (HW Layer)
     7   'maxpool_2'     2-D Max Pooling         2×2 max pooling with stride [2  2] and padding [0  0  0  0]   (HW Layer)
     8   'conv_3'        2-D Convolution         32 3×3×16 convolutions with stride [1  1] and padding 'same'  (HW Layer)
     9   'relu_3'        ReLU                    ReLU                                                          (HW Layer)
    10   'fc'            Fully Connected         10 fully connected layer                                      (HW Layer)
    11   'softmax'       Softmax                 softmax                                                       (SW Layer)
    12   'classoutput'   Classification Output   crossentropyex with '0' and 9 other classes                   (SW Layer)
                                                                                                             
### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.


              Deep Learning Processor Estimator Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                      41020                  0.00021                       1              41020           4875.7
    imageinput_norm           4236                  0.00002 
    conv_1                    6683                  0.00003 
    maxpool_1                 5804                  0.00003 
    conv_2                    5509                  0.00003 
    maxpool_2                 4582                  0.00002 
    conv_3                    5905                  0.00003 
    fc                        8301                  0.00004 
 * The clock frequency of the DL processor is: 200MHz


### Finished writing input activations.
### Running single input activation.

Compare Validation Output

Compare the validation results from both bitstreams. For these bitstream configurations, increasing the number of threads used for convolution and fully connected layers increases the number of frames per second as well as the resource utilization. For more information on how to optimize your processor configuration based on the resource requirements of your hardware, see Estimate Resource Utilization for Custom Processor Configuration (Deep Learning HDL Toolbox).

valResults_custom_int8.Statistics

ans=2×7 table
    NetworkImplementation    FramesPerSecond    Number of Threads (Convolution)    Number of Threads (Fully Connected)    LUT Utilization (%)    BlockRAM Utilization (%)    DSP Utilization (%)
    _____________________    _______________    _______________________________    ___________________________________    ___________________    ________________________    ___________________

     {'Floating-Point'}         4875.6704                     16                                    4                      78.8523788674839          55.7565789473684         15.4365079365079  
     {'Quantized'     }         6418.8972                     16                                    4                       36.685274372446          47.7521929824561         10.5952380952381

valResults_custom_incThread.Statistics

ans=2×7 table
    NetworkImplementation    FramesPerSecond    Number of Threads (Convolution)    Number of Threads (Fully Connected)    LUT Utilization (%)    BlockRAM Utilization (%)    DSP Utilization (%)
    _____________________    _______________    _______________________________    ___________________________________    ___________________    ________________________    ___________________

     {'Floating-Point'}         8979.6835                     64                                   16                      264.974970811442          61.4583333333333         52.5793650793651  
     {'Quantized'     }        12126.3566                     64                                   16                      61.4561441914769           49.671052631579         32.0238095238095

Input Arguments

collapse all

`quantObj` — Network to quantize
`dlquantizer` object

Network to quantize, specified as a dlquantizer object.

`valData` — Data to use for validation of quantized network
`imageDatastore` object | `augmentedImageDatastore` object | `pixelLabelImageDatastore` object | `CombinedDatastore` object | `TransformedDatastore` object

Data to use for validation of quantized network, specified as an imageDatastore object, an augmentedImageDatastore object, a pixelLabelImageDatastore (Computer Vision Toolbox) object, a CombinedDatastore object, or a TransformedDatastore object.

You must preprocess the data used for validation of a quantized yolov3ObjectDetector (Computer Vision Toolbox) object using the preprocess (Computer Vision Toolbox) function. For an example of using preprocessed data for validation of a yolov3ObjectDetector, see Quantize YOLO v3 Object Detector.

validate accepts a CombinedDatastore or TransformedDatastore object as input data for validating quantized yolov3ObjectDetector and yolov4ObjectDetector objects. The CombinedDatastore and TransformedDatastore used for validation must contain an imageDatastore or augmentedImageDatastore as the first datastore and a boxLabelDatastore as the second datastore. For more information on valid datastores, see Prepare Data for Quantizing Networks.

`quantOpts` — Options for quantizing network
`dlquantizationOptions` object

Options for quantizing the network, specified as a dlquantizationOptions object.

Output Arguments

collapse all

`valResults` — Performance of quantized network
struct

Performance of quantized network, returned as a struct. The struct contains these fields.

NumSamples — The number of sample inputs used to validate the network, specified by valData.

MetricResults — Struct containing results of the metric function defined in the dlquantizationOptions object. When more than one metric function is specified in the dlquantizationOptions object, MetricResults is an array of structs.

MetricResults contains these fields:

Field Description

MetricFunction Metric function used to determine the performance of the quantized network, specified in the dlquantizationOptions object.

Field	Description
`MetricFunction`	Metric function used to determine the performance of the quantized network, specified in the `dlquantizationOptions` object.
`Result`	Table indicating the results of the metric function before and after quantization. The first row in the table, `'Floating-Point'`, contains information for the original floating-point implementation. The second row, `'Quantized'`, contains information for the quantized implementation. The output of the metric function is displayed in the `MetricOutput` column.

Result

Table indicating the results of the metric function before and after quantization.

The first row in the table, 'Floating-Point', contains information for the original floating-point implementation. The second row, 'Quantized', contains information for the quantized implementation. The output of the metric function is displayed in the MetricOutput column.

Statistics — Table indicating the learnable parameter memory used, in bytes, by the original floating-point implementation of the network and the quantized implementation.
When the ExecutionEnvironment for the dlquantizer object is set to FPGA, the Statistics table is a table indicating these values for the original floating-point and quantized network implementations:
- Frames per second
- Number of convolution threads
- Number of fully connected threads
- Lookup table (LUT) resource utilization percentage
- Block RAM resource utilization percentage
- DSP resource utilization percentage
The Statistics table will be empty when the Target property of dlquantizationOptions is set to 'host'.

Limitations

Validation on target hardware for CPU, FPGA, and GPU execution environments is not supported in MATLAB^® Online™. For FPGA and GPU execution environments, validation can be performed through emulation on the MATLAB Online host. GPU validation can also be performed if GPU support has been added to your MATLAB Online Server™ cluster. For more information on GPU support for MATLAB Online, see Configure GPU Support in MATLAB Online Server (MATLAB Online Server).

Algorithms

The validate function determines the default metric function to use for the validation based on the type of network that is being quantized.

Type of Network	Metric Function
Classification	Top-1 Accuracy — Accuracy of the network
Object Detection	Average Precision — Average precision over all detection results. See `evaluateObjectDetection` (Computer Vision Toolbox).
Regression	MSE — Mean squared error of the network
Semantic Segmentation	`evaluateSemanticSegmentation` (Computer Vision Toolbox) — Evaluate semantic segmentation data set against ground truth
Single Shot Detector (SSD)	WeightedIOU — Average IoU of each class, weighted by the number of pixels in that class

Version History

Introduced in R2020a

expand all

R2022a: Validate the performance of quantized network for CPU target

You can now use the dlquantizer object and the validate function to quantize a network and generate code for CPU targets.

validate

Syntax

Description

Examples

Quantize a Neural Network for GPU Target

Quantize Network for FPGA Deployment

Quantize a Neural Network for CPU Target

Quantize YOLO v3 Object Detector

Validate Quantized Network on FPGA Using Custom Bitstream

Input Arguments

`quantObj` — Network to quantize
`dlquantizer` object

`valData` — Data to use for validation of quantized network
`imageDatastore` object | `augmentedImageDatastore` object | `pixelLabelImageDatastore` object | `CombinedDatastore` object | `TransformedDatastore` object

`quantOpts` — Options for quantizing network
`dlquantizationOptions` object

Output Arguments

`valResults` — Performance of quantized network
struct

Limitations

Algorithms

Version History

R2022a: Validate the performance of quantized network for CPU target

See Also

Apps

Functions

Topics

validate

Syntax

Description

Examples

Quantize a Neural Network for GPU Target

Quantize Network for FPGA Deployment

Quantize a Neural Network for CPU Target

Quantize YOLO v3 Object Detector

Validate Quantized Network on FPGA Using Custom Bitstream

Input Arguments

quantObj — Network to quantize dlquantizer object

valData — Data to use for validation of quantized network imageDatastore object | augmentedImageDatastore object | pixelLabelImageDatastore object | CombinedDatastore object | TransformedDatastore object

quantOpts — Options for quantizing network dlquantizationOptions object

Output Arguments

valResults — Performance of quantized network struct

Limitations

Algorithms

Version History

R2022a: Validate the performance of quantized network for CPU target

See Also

Apps

Functions

Topics

`quantObj` — Network to quantize
`dlquantizer` object

`valData` — Data to use for validation of quantized network
`imageDatastore` object | `augmentedImageDatastore` object | `pixelLabelImageDatastore` object | `CombinedDatastore` object | `TransformedDatastore` object

`quantOpts` — Options for quantizing network
`dlquantizationOptions` object

`valResults` — Performance of quantized network
struct