Debug YOLO v2 Vehicle Detector on FPGA
This example shows how to debug hardware by visualizing signals from a vehicle detector design deployed on the AMD® Zynq® UltraScale+(TM) MPSoC ZCU102 board. You use FPGA data capture and AXI manager features of the HDL Verifier™ Support Package for AMD FPGA and SoC Devices software to set triggers and capture the signals of interest. The Deploy and Verify YOLO v2 Vehicle Detector on FPGA example shows how to deploy a vehicle detector design on an FPGA. In this example, you integrate FPGA data capture and AXI manager features into this design to debug and visualize its functionality.
Introduction
Debugging designs, especially those deployed to the FPGA, can be a difficult task without a proper set of tools. FPGA data capture and AXI manager offer many capabilities to easily debug designs deployed to an FPGA. In this example, you focus on the Preprocessing module of the design. You analyze several scenarios where proper debugging is required to ensure the application behaves correctly. The scenarios are:
- Handshaking between the Preprocessing DUT and deep learning (DL) IP core. This scenario shows how to use FPGA data capture and AXI manager features to visualize the handshaking events between the Preprocessing DUT and the DL IP in the Logic Analyzer (DSP System Toolbox). You use FPGA data capture to tap the handshaking signals between the Preprocessing DUT and the DL IP from the FPGA. 
- Functionality of the - ResizeSubsystem. This scenario shows how to add debug hooks to the model and use them for debugging and verification.
- Handshaking between the Preprocessing DUT and the DDR memory. This scenario shows how to visualize the handshaking events between the Preprocessing DUT and the DDR memory in the Logic Analyzer. You use FPGA data capture to tap the handshaking signals between the Preprocessing DUT and the DDR memory from the FPGA. 
Add Debug Hooks and Test Points in Model
To capture signal data using FPGA data capture, configure the signal as a test point. For more information, see Configure Signals as Test Points (Simulink). Configure all the signals described in this section as test points. Use the Bus Selector (Simulink) block to extract signals from a bus and then add test points. To calculate the valid pixel flow through the Resize subsystem, add debugging logic using counters within the YOLOv2PreprocessAlgorithm model. Use the helperConfigAndAddTestPoints function to automate adding the counters and test points to the YOLOv2PreprocessAlgorithm and DLHandshakeLogicExtMem models. The helperConfigAndAddTestPoints function creates the four models, which are YOLOv2PreprocessTbDebug, YOLOv2PreprocessDUTDebug, YOLOv2PreprocessAlgoDebug, and DLHandshakeLogicDebug. These four models contain all the required testpoints and debug hooks.
This figure shows the signals that are configured as test points in the YOLOv2PreprocessAlgoDebug model.

This figure shows the signals that are configured as test points in the DLHandshakeLogicDebug model.

Use the Simulink.BlockDiagram.arrangeSystem (Simulink) function to improve the layout of the model.
Integrate FPGA Data Capture and AXI Manager in HDL Workflow Advisor
To generate IP core files for a DL processor, follow the steps in the Configure Deep Learning Processor and Generate IP Core section of the Deploy and Verify YOLO v2 Vehicle Detector on FPGA example. Use the helperUpdateHDLWorkflowAdvisor function to automate configuring the HDL workflow advisor settings and generate the bitstream. You must provide the complete path to the DL IP core files. Set the buffer size for FPGA data capture IP to 16384 and the maximum sequence depth to 7.
pathToDLIPFiles = 'F:\dlhdl_prj\ipcore\dlprocessor_v1_0'; modelWithTestPoints = {'YOLOv2PreprocessTbDebug','YOLOv2PreprocessDUTDebug','YOLOv2PreprocessAlgoDebug','DLHandshakeLogicDebug'}; helperUpdateHDLWorkflowAdvisor(pathToDLIPFiles,modelWithTestPoints,'16384','7')
Follow these steps to perform this task manually.
- Start the targeting workflow by right-clicking the - YOLO v2 Preprocess DUT Subsystemsubsystem in the- YOLOv2PreprocessTbDebugmodel and selecting HDL Code > HDL Workflow Advisor.
- In step 1.1, select - IP Core Generationand set Target platform to- Xilinx Zynq Ultrascale+ MPSoC ZCU102 Evaluation Kit.
- In step 1.2, set Reference design to - Deep Learning with Preprocessing Interface. The DL Processor IP name and the DL Processor IP location fields specify the name and location of the generated deep learning processor IP core, respectively. These details are fetched from the IP core report. Set Insert AXI manager to- JTAG.
- In step 1.3, enable the Enable HDL DUT output port generation for test points setting to update the interface table with all the test points as output ports for the generated DUT. Map the target platform interfaces to the input and output ports of the DUT. For the required interface mapping, see step 1.3 in Generate and Deploy Bitstream to FPGA section of the Deploy and Verify YOLO v2 Vehicle Detector on FPGA example. This table shows the interface mapping for test points. To capture and visualize the trigger signals in the Logic Analyzer, map the trigger signals to - Trigger and Datainstead of- Trigger. For more information, see Use As (HDL Verifier).

- Perform steps 1.4 to 3.1 as shown in the Generate and Deploy Bitstream to FPGA section of the Deploy and Verify YOLO v2 Vehicle Detector on FPGA example. 
- In step 3.2, set FPGA data capture buffer size to - 16384and FPGA data capture maximum sequence depth to- 7. Select Include capture condition logic in FPGA data capture to enable the capture control logic option in the generated FPGA data capture component.

- In step 4.3, generate the bitstream. The HDL Workflow Advisor generates the - block_design_wrapper.bitbitstream file in the- hdl_prj\vivado_ip_prj\vivado_prj.runs\impl_1folder.
Handshaking Between Preprocessing DUT and Deep Learning IP Core
The DL IP core expects the preprocessed data to be at a specific address in the DDR memory and to have a specific size. The handshaking between the Preprocessing DUT and the DL IP core is to convey the expected address and size to the Preprocessing DUT. The handshaking comprises these steps:
- The Preprocessing DUT drives the - rd_addr,- rd_len, and- rd_avalidcontrol signals in the- AXIReadCtrlOutDLbus.
- The DL IP core samples these control signals and responds to the Preprocessing DUT by sending the data at the - rd_addrlocation through the- AXIReadDataDLsignal. The DL IP core also drives the corresponding control signals,- rd_dvalidand- rd_aready, in the- AXIReadCtrlInDLbus.
- This process continues for three different addresses corresponding to - InputValid(x"354"),- InputAddr(x"358"), and- InputSize(x"35C") signals. The IP core generation report for the DL IP contains the addresses for these registers.

Signals Required for Debugging
The DLHandshakeLogicExtMem model contains these signals.
- rd_addr --- Address location in the DL IP from which the Preprocessing DUT fetches the required information during handshaking. 
- rd_len --- Size of data, in bytes, to read from the DL IP starting from the - rd_addraddress location.
- rd_avalid --- Indication of whether the data in the - rd_addrand- rd_lensignals of the same bus is valid.
- Data_From_DL --- Information based on the control information the DL IP receives from the Preprocessing DUT in the - AXIReadCtrlOutDLbus. The DL IP sends appropriate information on this signal.
- rd_dvalid --- Control signal that forms part of the - AXIReadCtrlInDLbus. This signal validates the data in the- AXIReadDataDLsignal.
- inputAddr_from_DL --- Output of the - Read DL Registerssubsystem. The Preprocessing DUT places the preprocessed data in the DDR memory at this address.
- inputSize_from_DL --- Output of the - Read DL Registerssubsystem. This output is the size of the data that the Preprocessing DUT places in the DDR memory.
- inputValid_from_DL --- Output of the - Read DL Registerssubsystem. This signal validates the data in the- inputAddr_from_DLand- inputSize_from_DLsignals.
Timing Diagram
This timing diagram shows the sequence of events for this scenario.

Trigger Conditions in FPGA Data Capture
A successful handshaking between the Preprocessing DUT and DL IP comprises seven events. These events act as sequential triggers in the FPGA Data Capture tool to capture the data.


Configure these settings in the FPGA Data Capture tool:
- Set Number of capture windows to - 1to indicate that handshaking events happen only at the beginning of preprocessing. The signal data corresponding to the entire sample depth can be captured in a single window once these trigger conditions are satisfied.
- Set Number of trigger stages to - 7to indicate that the handshaking comprises seven events.
- Set Trigger Position to a small value close to zero. If you set this option to - 0, you cannot visualize these events because the tool captures signal data only after this trigger.
- Repeat the Trigger Stage 1 and Trigger Stage 2 sequences three times. 
- Use a trigger time out to ensure that Trigger Stage 7 happens within one clock cycle of Trigger Stage 6. Trigger Stage 7 corresponds to a rising edge on the - inpValid_from_DLsignal
- Set Capture mode to - On Trigger.
Visualize Captured Data in Logic Analyzer
This timing diagram shows that the handshaking between the Preprocessing DUT and the DL IP behaves as expected.

Functionality of Resize Subsystem
In this scenario, the focus is to verify the behavior of the Resize subsystem. The input image to the Resize subsystem is of size 224-by-340 (76,160 pixels). The output image of the Resize subsystem is of size 128-by-128 (16,384 pixels). You can use FPGA data capture feature to count the total number of output pixels from the Resize subsystem and capture the resized image data to find any errors within the logic. Simulink™ does not support renaming of the output of a Bus Selector block. To rename the signal, use the model components contained in the green boxes in this image.

Signals Required for Debugging
The YOLOv2PreprocessAlgoDebug model contains these signals.
- Input_Pix_Valid --- Control signal that is a part of the - pixelcontrolbus input of the- Resizesubsystem. This signal validates the pixel data in the- Inp_Pixel_Datasignal.
- Input_Pix_Cnt --- Output of the HDL Counter block, which counts the number of valid pixels that you pass as input to the - Resizesubsystem. The model uses the- Input_Pix_Validsignal to enable this counter.
- Resized_Pix_Data --- Output signal of the - Resizesubsystem. This signal contains the pixel data corresponding to the resized image.
- Resized_Pix_Valid --- Control signal that is a part of the - pixelcontrolbus output of the- Resizesubsystem. This signal validates the pixel data in the- Resized_Pix_Datasignal.
- Resized_Pix_Cnt --- Output of the HDL Counter block, which counts the number of valid pixels returned by the - Resizesubsystem. The model uses the- Resized_Pix_Validsignal to enable this counter.
Timing Diagram
Validate the output pixel data using the Resized_Pix_Valid signal. Whenever this signal goes high, the Resize subsystem sends the valid output data, as this timing diagram shows. The Input_Pix_Cnt and Resized_Pix_Cnt signals indicate the number of valid pixels entering and emerging from the Resize subsystem, respectively.

Trigger Conditions in FPGA Data Capture
To capture the valid resized pixel data, use the capture condition logic in the FPGA Data Capture tool.

Configure these settings in the FPGA Data Capture tool:
- Select Enable the capture control logic in the Capture Condition tab. 
- Use the - Resized_Pix_Validsignal in the capture condition logic to ensure that the tool captures the data only when this signal goes high.
- Select - Immediatelyin the capture mode dropdown menu to enable immediate capture. This option is suitable for scenarios in which no specific triggers determine when the tool captures data.
Visualize Captured Data in Logic Analyzer
This timing diagram shows the resized pixel data and the pixel counts captured by the FPGA Data Capture tool. The tp_Resized_Pix_Valid signal is always high, unlike in the equivalent model simulations using Simulink software. This discrepancy is because the capture condition indicates that the FPGA Data Capture tool captures data only when tp_Resized_Pix_Valid is high.

The FPGA Data Capture tool creates the dataCaptureOut structure in the MATLAB® workspace after it captures data. Visualize the resized image by extracting and concatenating the RGB image data from dataCaptureOut.
RData = reshape(dataCaptureOut.tp_Resized_Pix_Data_0,128,128); BData = reshape(dataCaptureOut.tp_Resized_Pix_Data_2,128,128); GData = reshape(dataCaptureOut.tp_Resized_Pix_Data_1,128,128); resizedImage = cat(3,RData',GData',BData'); imshow(resizedImage)

Scenario 3: Handshaking Between Preprocessing DUT and DDR Memory
After the Preprocessing DUT resizes and normalizes the input image, it places the preprocessed image data in the DDR memory at the address it receives from the DL IP. The handshaking process comprises these steps:
- The Preprocessing DUT drives the - wr_addr,- wr_len, and- wr_validcontrol signals in the- AXIWriteCtrlOutDDRbus. The DUT also sends the preprocessed signal data through the- AXIWriteDataDDRsignal.
- The DDR memory samples these control signals and the preprocessed pixel data received from the Preprocessing DUT. 
- Once all the data is placed in the DDR memory, the DDR memory acknowledges the Preprocessing DUT with a pulse on the - wr_completesignal in the- AXIWriteCtrlInDDRbus.

Signals Required for Debugging
The DLHandshakeLogicDebug model contains these signals.
- wr_addr --- Control signal that is a part of the - AXIWriteDataDDRbus. This signal is the address in the DDR memory at which the Preprocessing DUT places the data.
- wr_len --- Control signal that is a part of the - AXIWriteDataDDRbus. This signal is the size of data, in bytes, that the Preprocessing DUT places in the DDR memory starting from the- wr_addraddress location.
- wr_valid --- Control signal that is a part of the - AXIWriteDataDDRbus. This signal validates the data in the- wr_addr, and- wr_lensignals of the same bus.
- wr_complete --- Control signal that is a part of the - AXIWriteCtrlInDDRbus. This signal is the acknowledgement sent from the DDR memory to the Preprocessing DUT containing an indication of the status of the data.
- writeDone --- Output of the - Write To DDRsubsystem. This signal indicates whether the data transfer to the DDR memory is successful and triggers the DL IP to start reading that data from the DDR memory for further processing.
Timing Diagram
After the final rising edge on the wr_valid control signal occurs, the DDR memory sends a pulse on the wr_complete signal as an acknowledgement and a pulse sent on the writeDone internal signal. This timing diagram shows the sequence of events for this scenario.

Trigger Conditions in FPGA Data Capture
Configure these settings in the FPGA Data Capture tool:
- Set Number of capture windows to - 1because these handshaking events happen towards the end of the transaction between Preprocessing DUT and the DDR memory. After these trigger conditions are satisfied, the signal data corresponding to the entire sample depth can be captured in a single window.
- Set Number of trigger stages to - 2because this handshaking event comprises three events, of which two events occur simultaneously.
- Set Trigger position option close to the end of the handshake to ensure the Logic Analyzer displays the complete handshake. 
- Set Capture mode to - On Trigger.
The Trigger Stage 1 corresponds to a rising edge on wr_valid signal from the DDR memory.
The Trigger Condition 2 section captures an expected pulse on the wr_complete and writeDone signals. This stage uses logical and comparison operators.

Visualize Captured Data in Logic Analyzer
This timing diagram confirms that the handshaking between Preprocessing DUT and DDR memory happens as expected.

Use FPGA Data Capture and AXI Manager Features Simultaneously
As described in Design Considerations for Data Capture (HDL Verifier), to use AXI manager and FPGA data capture features simultaneously, set the capture mode of FPGA data capture to nonblocking. Create an FPGADataCapture object in non-blocking mode and launch the FPGA Data Capture tool.
cd(fullfile('hdl_prj','ipcore','YOLOV2Pre_cs_ipv4_v1_0','fpga_data_capture')) fpgadc = FPGADataCapture; fpgadc.CaptureMode = 'nonblocking'; launchApp(fpgadc);
You must configure a few registers before sending a video frame as an input to the model. Set the DUTProcStart register of the Preprocessing DUT to 1. AXI manager can be leveraged to do this task. The YOLOv2DeployAndVerifyDetector function that is attached with the Deploy and Verify YOLO v2 Vehicle Detector on FPGA example has all the steps present in Verify Deployed YOLO v2 Vehicle Detector Using MATLAB section. The YOLOv2DeployAndVerifyDetector function uses writePort function to configure all the control registers. To use the AXI manager instead of writePort to configure the DUTProcStart register, use the helperUpdateYOLOv2DeployAndVerifyDetector function.
The helperUpdateYOLOv2DeployAndVerifyDetector function creates the DebugYOLOv2VehicleDetector function which is a modified version of the YOLOv2DeployAndVerifyDetector function and contains an object of the AXI manager. The helperUpdateYOLOv2DeployAndVerifyDetector function adds this code to the DebugYOLOv2VehicleDetector function, which you can use to access AXI manager feature.
Create an AXI manager object.
h = aximanager('AMD');
Use writememory function to write 1 into the DUTProcStart register. The address for this register can be found in the IP Core Generation report.
writememory(h, '0xA0040100',1);
Release the JTAG cable resource after writing into the DUTProcStart register to ensure that FPGA data capture can use the same JTAG interface to capture the data.
release(h)
To capture the required data corresponding to different scenarios, the FPGA Data Capture tool with the appropriate trigger conditions. This diagram shows the data capture process:

- Configure the FPGA Data Capture tool with the trigger conditions and then click the Capture Data button to start the data capture process. The tool captures the data when it observes triggers. 
- Enter the command - DebugYOLOv2VehicleDetector(hSOC)to start the workflow comprising all the steps from configuring the registers to reading back the processed data to MATLAB. Because you start the FPGA Data Capture tool before this step, the FPGA Data Capture tool detects all the events.
The AXI manager configures the DUTProcStart control register while the FPGA Data Capture tool waits for the trigger condition to be satisfied. You can simultaneously use both of these tools to capture all the required data.
See Also
Deploy and Verify YOLO v2 Vehicle Detector on FPGA
Topics
- Design Considerations for Data Capture (HDL Verifier)
- Set Up AXI Manager (HDL Verifier)
- Target Deep Learning Processor and Image Preprocessing to FPGA (SoC Blockset)