Bicyclist and Pedestrian Classification by Using FPGA

This example uses:

Deep Learning HDL Toolbox Deep Learning HDL Toolbox
Deep Learning Toolbox Deep Learning Toolbox
Deep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC Devices Deep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC Devices

This example shows how to deploy a custom trained network to detect pedestrians and bicyclists based on their micro-Doppler signatures. This network is taken from the Pedestrian and Bicyclist Classification Using Deep Learning example from the Phased Array Toolbox. For more details on network training and input data, see Pedestrian and Bicyclist Classification Using Deep Learning (Radar Toolbox).

Prerequisites

Zynq® UltraScale+™ MPSoC ZCU102 Evaluation Kit
Deep learning HDL Toolbox™ Support Package for Xilinx® FPGA and SoC Devices
Deep Learning Toolbox™
Deep Learning HDL Toolbox™

The data files used in this example are:

The MAT File trainedNetBicPed.mat contains a model trained on training data set trainDataNoCar and its label set trainLabelNoCar.
The MAT File testDataBicPed.mat contains the test data set testDataNoCar and its label set testLabelNoCar.

Load Data and Network

Load the pretrained network. Load test data and its labels.

load('trainedNetBicPed.mat','trainedNetNoCar')
load('testDataBicPed.mat')

View the layers of the pre-trained network:

deepNetworkDesigner(trainedNetNoCar);

Set up HDL Toolpath

Set up the path to your installed Xilinx™ Vivado™ Design Suite 2023.1 executable if it is not already set up. For example, to set the toolpath, enter:

% hdlsetuptoolpath('ToolName', 'Xilinx Vivado','ToolPath', 'C:\Vivado\2023.1\bin');

Create Target Object

Create a target object for your target device with a vendor name and an interface to connect your target device to the host computer. Interface options are JTAG (default) and Ethernet. Vendor options are Intel or Xilinx. Use the installed Xilinx Vivado Design Suite over an Ethernet connection to program the device.

hT = dlhdl.Target('Xilinx', 'Interface', 'Ethernet');

Create Workflow Object

Create an object of the dlhdl.Workflow class. When you create the object, specify the network and the bitstream name. Specify the saved pre-trained network, trainedNetNoCar, as the network. Make sure the bitstream name matches the data type and the FPGA board that you are targeting. In this example, the target FPGA board is the Zynq UltraScale+ MPSoC ZCU102 board. The bitstream uses a single data type.

hW = dlhdl.Workflow('Network', trainedNetNoCar, 'Bitstream', 'zcu102_single', 'Target', hT);

Compile `trainedNetNoCar` Network

To compile the trainedNetNoCar network, run the compile function of the dlhdl.Workflow object.

dn = hW.compile;

### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream zcu102_single.
### An output layer called 'Output1_softmax' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network.
### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
### The network includes the following layers:
     1   'imageinput'        Image Input           400×144×1 images                                                 (SW Layer)
     2   'conv_1'            2-D Convolution       16 10×10×1 convolutions with stride [1  1] and padding 'same'    (HW Layer)
     3   'relu_1'            ReLU                  ReLU                                                             (HW Layer)
     4   'maxpool_1'         2-D Max Pooling       10×10 max pooling with stride [2  2] and padding [0  0  0  0]    (HW Layer)
     5   'conv_2'            2-D Convolution       32 5×5×16 convolutions with stride [1  1] and padding 'same'     (HW Layer)
     6   'relu_2'            ReLU                  ReLU                                                             (HW Layer)
     7   'maxpool_2'         2-D Max Pooling       10×10 max pooling with stride [2  2] and padding [0  0  0  0]    (HW Layer)
     8   'conv_3'            2-D Convolution       32 5×5×32 convolutions with stride [1  1] and padding 'same'     (HW Layer)
     9   'relu_3'            ReLU                  ReLU                                                             (HW Layer)
    10   'maxpool_3'         2-D Max Pooling       10×10 max pooling with stride [2  2] and padding [0  0  0  0]    (HW Layer)
    11   'conv_4'            2-D Convolution       32 5×5×32 convolutions with stride [1  1] and padding 'same'     (HW Layer)
    12   'relu_4'            ReLU                  ReLU                                                             (HW Layer)
    13   'maxpool_4'         2-D Max Pooling       5×5 max pooling with stride [2  2] and padding [0  0  0  0]      (HW Layer)
    14   'conv_5'            2-D Convolution       32 5×5×32 convolutions with stride [1  1] and padding 'same'     (HW Layer)
    15   'relu_5'            ReLU                  ReLU                                                             (HW Layer)
    16   'avgpool2d'         2-D Average Pooling   2×2 average pooling with stride [2  2] and padding [0  0  0  0]  (HW Layer)
    17   'fc'                Fully Connected       5 fully connected layer                                          (HW Layer)
    18   'softmax'           Softmax               softmax                                                          (SW Layer)
    19   'Output1_softmax'   Regression Output     mean-squared-error                                               (SW Layer)
                                                                                                                  
### Notice: The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'Output1_softmax' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.
### Compiling layer group: conv_1>>relu_5 ...
### Compiling layer group: conv_1>>relu_5 ... complete.
### Compiling layer group: avgpool2d ...
### Compiling layer group: avgpool2d ... complete.
### Compiling layer group: fc ...
### Compiling layer group: fc ... complete.

### Allocating external memory buffers:

          offset_name          offset_address    allocated_space 
    _______________________    ______________    ________________

    "InputDataOffset"           "0x00000000"     "26.4 MB"       
    "OutputResultOffset"        "0x01a5e000"     "4.0 kB"        
    "SchedulerDataOffset"       "0x01a5f000"     "72.0 kB"       
    "SystemBufferOffset"        "0x01a71000"     "7.1 MB"        
    "InstructionDataOffset"     "0x0217e000"     "1020.0 kB"     
    "ConvWeightDataOffset"      "0x0227d000"     "888.0 kB"      
    "FCWeightDataOffset"        "0x0235b000"     "24.0 kB"       
    "EndOffset"                 "0x02361000"     "Total: 35.4 MB"

### Network compilation complete.

Program the Bitstream onto FPGA and Download Network Weights

To deploy the network on the Zynq® UltraScale+™ MPSoC ZCU102 hardware, run the deploy function of the dlhdl.Workflow object. This function uses the output of the compile function to program the FPGA board by using the programming file. The function also downloads the network weights and biases. The deploy function checks for the Xilinx Vivado tool and the supported tool version. It then starts programming the FPGA device by using the bitstream, displays progress messages and the time it takes to deploy the network.

hW.deploy;

### Programming FPGA Bitstream using Ethernet...
### Attempting to connect to the hardware board at 192.168.1.101...
### Connection successful
### Programming FPGA device on Xilinx SoC hardware board at 192.168.1.101...
### Attempting to connect to the hardware board at 192.168.1.101...
### Connection successful
### Copying FPGA programming files to SD card...
### Setting FPGA bitstream and devicetree for boot...
# Copying Bitstream zcu102_single.bit to /mnt/hdlcoder_rd
# Set Bitstream to hdlcoder_rd/zcu102_single.bit
# Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd
# Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb
# Set up boot for Reference Design: 'AXI-Stream DDR Memory Access : 3-AXIM'
### Programming done. The system will now reboot for persistent changes to take effect.
### Rebooting Xilinx SoC at 192.168.1.101...
### Reboot may take several seconds...
### Attempting to connect to the hardware board at 192.168.1.101...
### Connection successful
### Programming the FPGA bitstream has been completed successfully.
### Loading weights to Conv Processor.
### Conv Weights loaded. Current time is 19-Jun-2024 17:04:03
### Loading weights to FC Processor.
### FC Weights loaded. Current time is 19-Jun-2024 17:04:03

Run Predictions on Micro-Doppler Signatures

Classify one input from the sample test data set by using the predict function of the dlhdl.Workflow object and display the label. The inputs to the network correspond to the sonograms of the micro-Doppler signatures for a pedestrian or a bicyclist or a combination of both.

testImg = single(testDataNoCar(:, :, :, 1));
testLabel = testLabelNoCar(1);

% Get predictions from network on single test input
testImg = dlarray(testImg, 'SSCB');
score = hW.predict(testImg, 'Profile', 'On')

### Finished writing input activations.
### Running single input activation.


              Deep Learning Processor Profiler Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                    9312021                  0.04233                       1            9312779             23.6
    conv_1                 4186343                  0.01903 
    maxpool_1              1387467                  0.00631 
    conv_2                 1976965                  0.00899 
    maxpool_2               604660                  0.00275 
    conv_3                  816212                  0.00371 
    maxpool_3               121647                  0.00055 
    conv_4                  146400                  0.00067 
    maxpool_4                18760                  0.00009 
    conv_5                   42908                  0.00020 
    avgpool2d                 7226                  0.00003 
    fc                        3391                  0.00002 
 * The clock frequency of the DL processor is: 220MHz

score = 
  5(C) × 1(B) single dlarray

    0.9956
    0.0000
    0.0000
    0.0044
    0.0000

[~, idx1] = max(score);
predTestLabel = testLabelNoCar(1,1,1,idx1)

predTestLabel = categorical
     ped

Load five random images from the sample test data set and execute the predict function of the dlhdl.Workflow object to display the labels alongside the signatures. The predictions will happen at once since the input is concatenated along the fourth dimension.

numTestFrames = size(testDataNoCar, 4);
numView = 5;
listIndex = randperm(numTestFrames, numView);
testImgBatch = single(testDataNoCar(:, :, :, listIndex));
testLabelBatch = testLabelNoCar(listIndex);

% Get predictions from network using DL HDL Toolbox on FPGA
testImgBatch = dlarray(testImgBatch, 'SSCB');
[scores, speed] = hW.predict(testImgBatch, 'Profile', 'On');

### Finished writing input activations.
### Running in multi-frame mode with 5 inputs.


              Deep Learning Processor Profiler Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                    9314346                  0.04234                       5           46556877             23.6
    conv_1                 4188705                  0.01904 
    maxpool_1              1387527                  0.00631 
    conv_2                 1976807                  0.00899 
    maxpool_2               604685                  0.00275 
    conv_3                  815776                  0.00371 
    maxpool_3               121686                  0.00055 
    conv_4                  146622                  0.00067 
    maxpool_4                18760                  0.00009 
    conv_5                   43098                  0.00020 
    avgpool2d                 7234                  0.00003 
    fc                        3404                  0.00002 
 * The clock frequency of the DL processor is: 220MHz

[~, idx2] = max(scores, [], 1);
predTestLabelBatch = testLabelNoCar(1,1,1,idx2);

% Display the micro-doppler signatures along with the ground truth and
% predictions.
for k = 1:numView
    index = listIndex(k);
    imagesc(testDataNoCar(:, :, :, index));
    axis xy
    xlabel('Time (s)')
    ylabel('Frequency (Hz)')
    title('Ground Truth: '+string(testLabelNoCar(index))+', Prediction FPGA: '+string(predTestLabelBatch(k)))
    drawnow;
    pause(3);
end

The image shows the micro-Doppler signatures of two bicyclists (bic+bic) which is the ground truth. The ground truth is the classification of the image against which the network prediction is compared. The network prediction retrieved from the FPGA correctly predicts that the image has two bicyclists.

Bicyclist and Pedestrian Classification by Using FPGA

Prerequisites

Load Data and Network

Set up HDL Toolpath

Create Target Object

Create Workflow Object

Compile `trainedNetNoCar` Network

Program the Bitstream onto FPGA and Download Network Weights

Run Predictions on Micro-Doppler Signatures

See Also

Related Topics

Bicyclist and Pedestrian Classification by Using FPGA

Prerequisites

Load Data and Network

Set up HDL Toolpath

Create Target Object

Create Workflow Object

Compile trainedNetNoCar Network

Program the Bitstream onto FPGA and Download Network Weights

Run Predictions on Micro-Doppler Signatures

See Also

Related Topics

Compile `trainedNetNoCar` Network