Integrate YOLO v2 Vehicle Detector System on SoC
This example shows how to simulate a you only look once (YOLO) vehicle detector and verify the functionality of the end-to-end application using MATLAB.
The end-to-end application includes preprocessing of the images, a YOLO v2 vehicle detection network, and postprocessing of the images to overlay results.
Load Camera Data and Network File
This example uses a PandasetCameraData.mp4 file that contains a subset of the video from PandaSet data set. Download the video file and the network .mat file.
supportFileDir = matlab.internal.examples.utils.getSupportFileDir(); pathToDataset = fullfile(supportFileDir, 'visionhdl', 'PandasetCameraData'); if(~isfile(fullfile(pathToDataset, 'PandasetCameraData.mp4')) ... || ~isfile(fullfile(pathToDataset, 'yolov2VehicleDetector32Layer.mat')) ... || ~isfile(fullfile(pathToDataset, 'yolov2VehicleDetector60Layer.mat'))) PandasetZipFile = matlab.internal.examples.downloadSupportFile('visionhdl','PandasetCameraData.zip'); [outputFolder,~,~] = fileparts(PandasetZipFile); unzip(PandasetZipFile,outputFolder); end addpath(pathToDataset);
A YOLO v2 vehicle detection application has three main modules. The preprocessing module, accepts the input frame and performs image resize and normalization. The preprocessed data is then consumed by the YOLO v2 vehicle detection network, which is a feature extraction network followed by a detection network. The network output is postprocessed to identify the strongest bounding boxes and the resulting bounding boxes are overlaid on the input image.
The preprocessing subsystem and DLIP are deployed on FPGA (Programmable Logic, PL) and the postprocessing is deployed on the ARM prcoessor (Processing System, PS). For deploying the vehicle detector, see YOLO v2 Vehicle Detector with Live Camera Input on Zynq-Based Hardware. This example shows how to model the preprocessing module (resize and normalization) and postprocessing module along with DL handshaking logic and network execution.
Explore Vehicle Detector
The vehicle detector contains these modules:
Source- Selects the inputImage from Pandaset.
Conversion- Converts the input frame into RGB pixel stream.
Pixel-stream based preprocessing(to FPGA)- Preprocesses the input frame and writes it into DDR.
Deep learning IP Core Simulation Logic- Models the DL processor to calculate activations on the input frame and write the output to DDR
Conversion- Converts the input RGB pixel stream to frame for overlaying bounding boxes.
Postprocessing and Overlay(to ARM)- Applies postprocessing to network output and overlay the bounding boxes on the input frame.
Display- Displays the input frame with detections.
numFrames number of images from Pandaset. The frame is initially resized and normalized in
YOLOv2PreprocessDUT and the preprocessed output is written into DDR at the address location read from DL input handshaking registers,
(InputValid, InputAddr, InputSize). The
DLIP calculates activations on the preprocessed image, writes the activations to DDR, and updates the DL output handshaking registers,
(OutputValid, OutputAddr, OutputSize). This handshaking triggers the
YOLOv2PostprocessDUT, that reads the DL output from the address information obtained from the DL registers, and performs post processing and calculates bounding boxes that are displayed in the
VideoViewer block via the
selectImage subsystem selects the input frame from
inputImages block. A
Frame To Pixels block converts the input image from the
selectImage to a pixel stream and
pixelcontrol bus. The
Unpack subsystem divides the pixel stream into R, G, B components. The RGB data,
(RIn, GIn, BIn) along with ctrl bus is fed for preprocessing. The input image is streamed out as,
(ROut, GOut, BOut) to write it into the PS DDR for overlaying the bounding boxes.
YOLOv2PreprocessDUT contains subsystems for frame dropping, selecting Region of Interest (ROI) from the input frame, preprocessing (resize and normalization), and handshaking logic.
Frame Drop subsystem synchronizes data between
DLIP by dropping the input frames if
DLIP is not available for processing. It contains finite state machine (FSM) logic for reading
DLIP registers and a pixel bus creator to concatenate the output control signals of frame drop logic to
pixel control bus. The
readInputRegisters subsystem reads the
inputAddrReg register and forwards the first frame to preprocessing and resets the control signals for rest of the frames until
inputAddr is updated by DLIP. This frame drop logic lets the DLIP process one frame corresponding to one
The output of the
Frame Drop subsystem is sent to the
ROI Selector block that selects the ROI from the input image and forwards it for preprocessing. The ROI is selected for the input image from Pandaset of size 1920x1080 and is scaled down by a factor of 4 for faster simulation. The ROI is configured in
hPos = 350; vPos = 400; hSize = 1000; vSize = 600;
YOLO v2 Preprocess Algorithm contains subsystems to perform resizing and normalization operations. The pixel stream from the
Frame Drop subsystem is passed to the
Resize subsystem for resizing the input image to the input size expected by the deep learning network,
(128, 128, 3). The resized output is passed to
Normalization subsystem for rescaling the pixel values to [0, 1] range. This preprocessed frame is then passed to the
DL Handshake Logic Ext Mem subsystem to be written into the PL DDR.
DL Handshake Logic Ext Mem subsystem contains a finite state machine (FSM) logic for handshaking with
DLIP and a subsystem to write the frame to DDR. The
Read DL Registers subsystem has the FSM logic to read the handshaking signals
(InputValid, InputAddr, and InputSize) from the
DLIP for multiple frames. The
Write to DDR subsystem uses these handshaking signals to write the preprocessed frame to the memory using AXI4-Master protocol. For more information on the
Yolov2PreprocessDUT refer to the example, Deploy and Verify YOLO v2 Vehicle Detector on FPGA
DLIP contains subsystems for prediction logic, DL input and output register handshaking logic, and an AXI Write controller to write the DL Output to DDR.
FetchPreprocessedImage subsystem reads and rearranges the output from
YOLOv2PreprocessDUT to the
networkInputSize as required by the deep learning network. The network and the activation layer of the
DLIP are setup using
This example uses a pretrained YOLO v2 network that was trained on Pandaset. The network output is rearranged to the external memory data format of the DL Processor by concatenating the elements along the third dimension. For more information, see External Memory Data Format (Deep Learning HDL Toolbox).
The DL output is written to memory using
AXIM Write Controller subsystem. The write operations from the
DLIP are multiplexed using
DDR Write Arbitrator.
YOLOv2PostprocessDUT subsystem contains subsystems for DL Handshaking, reading DL output, transforming and applying post processing to the DL Output. The DL handshaking subsystems have variant behavior depending on whether the model is configured for simulation or deployment based on
simulationFlag. Since this example demonstrates the simulation workflow, the
simulationFlag is set to true in
Set Control Registers subsystem sets the control registers for
DL Handshaking subsystem reads the DL Output handshaking registers,
(OutputValid, OutputAddr, OutputSize) indicating address, size, and validity of the output. The model abstracts these registers as datastore blocks for simulation. The
readDLOutput subsystem uses these handshaking signals and reads the DL Output from PL DDR.
readDLOutput subsystem contains subsystems for polling
OutputValid, generating read requests, and reading DL output from PL DDR. The
pollOutputValid function polls for the
OutputValid signal from
DLIP and triggers post processing when
OutputValid is asserted. The
read DL Output from PL DDR subsystem contains a signal
rdDone which indicates that DL Output read operation is completed successfully. The
TriggerDLOutputNext subsystem pulses
OutputNext signal when
rdDone is asserted to indicate to the
DLIP that the output of current frame is read.
The DL output data is then sent to
yolov2TransformlayerandPostprocess function for postprocessing. It transforms the DL Output from DDR by rearranging, normalizing the data, and thresholding the bounding boxes with a confidence score of 0.4. It returns the bounding boxes and pulses
postProcDone signal to indicate that the post processing is completed successfully.
YOLOv2PostprocessDUT is configured with these DL network parameters,
networkInputSize, networkOutputSize, anchorBoxes and
inputImageROI, inputROISize, confidenceThreshold in
vehicleDetector = load(networkmatfile); detector = vehicleDetector.detector; net = detector.Network; anchorBoxes = detector.AnchorBoxes; networkInputSize = net.Layers(1, 1).InputSize; networkOutputSize = [16,16,12]; paddedOutputSize = (networkOutputSize(1)*networkOutputSize(2)*networkOutputSize(3)*4)/3; inputImageROI = [hPos, vPos, hSize, vSize]; inputROISize = [vSize, hSize, numComponents]; confidenceThreshold = 0.4;
Simulate Vehicle Detector
Configure the network for the vehicle detector using the
The script supports 2 networks, a 32 layer network(default) and a 60 layer network. To run the 60 layer network, set the networkConfig to '60layer'.
This model takes a couple of minutes to update the diagram when you are compiling for the first time. Update the model before running the simulation.
set_param("YOLOv2VehicleDetectorOnSoC", SimulationCommand="update"); out = sim("YOLOv2VehicleDetectorOnSoC");
### Starting serial model reference simulation build. ### Model reference simulation target for DLHandshakeLogicExtMem is up to date. ### Model reference simulation target for YOLOv2PreprocessAlgorithm is up to date. Build Summary 0 of 2 models built (2 models already up to date) Build duration: 0h 0m 32.939s
Verify YOLOv2PreprocessDUT and YOLOv2PostprocessDUT using MATLAB
The example includes subsystems for verification of outputs of
Verify Preprocess Output and
Verify Postprocess Output subsystems log the signals required for the verification of the preprocessed image and bounding boxes, respectively.
Close the figures
helperVerifyVehicleDetector script verifies all the logged outputs obtained in simulation. It compares the preprocessed image obtained in simulation with the reference image obtained by applying resize and normalize operations and overlays the bounding boxes obtained from simulation and from
detect (Computer Vision Toolbox) function on the input images from the dataset.
This example demonstrated the YOLOv2 vehicle detector application comprising of preprocessing steps(image resize and normalization) and handshaking logic on FPGA, vehicle detection using DLIP followed by postprocessing and verified the results using MATLAB.
Copyright 2022-2023 The MathWorks, Inc.
- Deep Learning Processing of Live Video (Vision HDL Toolbox Support Package for Xilinx Zynq-Based Hardware)