Processor-In-The-Loop Execution from Command Line
Use the processor-in-the-loop (PIL) execution to check the numerical behavior of the CUDA® code that you generate from MATLAB® functions. A PIL simulation, which requires target connectivity, compiles generated source code, and then downloads and runs object code on NVIDIA® GPU platforms. The results of the PIL simulation are transferred to MATLAB to verify the numerical equivalence of the simulation and the code generation results.
The PIL verification process is a crucial part of the design cycle to check that the behavior of the generated code matches the design. PIL verification requires an Embedded Coder® license.
Note
When using PIL execution, make sure that the Benchmarking
option in
GPU Coder™ settings is false
. Executing PIL with benchmarking results
in compilation errors.
Note
GPU Coder does not support collecting code coverage metrics during software-in-the-loop (SIL) and processor-in-the-loop (PIL) simulations.
Prerequisites
Target Board Requirements
NVIDIA DRIVE® or Jetson™ embedded platform.
Ethernet crossover cable to connect the target board and host PC (if the target board cannot be connected to a local network).
NVIDIA CUDA toolkit installed on the board.
Environment variables on the target for the compilers and libraries. For information on the supported versions of the compilers and libraries and their setup, see Install and Setup Prerequisites for NVIDIA Boards.
Development Host Requirements
GPU Coder for CUDA code generation. For help on getting started with GPU Coder, see Get Started with GPU Coder (GPU Coder).
NVIDIA CUDA toolkit on the host.
Environment variables on the host for the compilers and libraries. For information on the supported versions of the compilers and libraries, see Third-Party Hardware (GPU Coder). For setting up the environment variables, see Environment Variables (GPU Coder).
Example: The Mandelbrot Set
Description
You do not have to be familiar with the algorithm in the example to complete the tutorial.
The Mandelbrot set is the region in the complex plane consisting of the values z0 for which the trajectories defined by
remain bounded at k→∞. The overall geometry of the Mandelbrot set is shown in the figure. This view does not have the resolution to show the richly detailed structure of the fringe just outside the boundary of the set. At increasing magnifications, the Mandelbrot set exhibits an elaborate boundary that reveals progressively finer recursive detail.
Algorithm
Create a MATLAB script called mandelbrot_count.m
with the following lines
of code. This code is a baseline vectorized MATLAB implementation of the Mandelbrot
set.
function count = mandelbrot_count(maxIterations, xGrid, yGrid) %#codegen % mandelbrot computation z0 = xGrid + 1i*yGrid; count = ones(size(z0)); % Add Kernelfun pragma to trigger kernel creation coder.gpu.kernelfun; z = z0; for n = 0:maxIterations z = z.*z + z0; inside = abs(z)<=2; count = count + inside; end count = log(count);
For this tutorial, pick a set of limits that specify a highly zoomed part of the
Mandelbrot set in the valley between the main cardioid and the p/q bulb
to its left. A 1000x1000 grid of real parts (x) and imaginary parts
(y) is created between these two limits. The Mandelbrot algorithm is
then iterated at each grid location. An iteration number of 500 is enough to render the
image in full resolution. Create a MATLAB script called mandelbrot_test.m
with the following lines
of code. It also calls the mandelbrot_count
function and plots the
resulting Mandelbrot set.
maxIterations = 500; gridSize = 1000; xlim = [-0.748766713922161, -0.748766707771757]; ylim = [ 0.123640844894862, 0.123640851045266]; x = linspace( xlim(1), xlim(2), gridSize ); y = linspace( ylim(1), ylim(2), gridSize ); [xGrid,yGrid] = meshgrid( x, y ); count = mandelbrot_count(maxIterations, xGrid, yGrid); figure(1) imagesc( x, y, count ); colormap([jet();flipud( jet() );0 0 0]); axis off title('Mandelbrot set');
Create a Live Hardware Connection Object
To communicate with the NVIDIA hardware, you must create a live hardware connection object by using the
jetson
or
drive
function.
To create a live hardware connection object, provide the host name or IP address, user name,
and password of the target board. For example to create live object for Jetson hardware:
hwobj = jetson('jetson-board-name','ubuntu','ubuntu');
The software performs a check of the hardware, compiler tools and libraries, IO server installation, and gathers information on the peripherals connected to the target. This information is displayed in the MATLAB Command Window.
Checking for CUDA availability on the Target... Checking for 'nvcc' in the target system path... Checking for cuDNN library availability on the Target... Checking for TensorRT library availability on the Target... Checking for prerequisite libraries is complete. Gathering hardware details... Checking for third-party library availability on the Target... Gathering hardware details is complete. Board name : NVIDIA Jetson TX2 CUDA Version : 10.0 cuDNN Version : 7.6 TensorRT Version : 6.0 GStreamer Version : 1.14.5 V4L2 Version : 1.14.2-1 SDL Version : 1.2 OpenCV Version : 4.1.1 Available Webcams : Microsoft® LifeCam Cinema(TM) Available GPUs : NVIDIA Tegra X2
Alternatively, to create live object for DRIVE hardware:
hwobj = drive('drive-board-name','nvidia','nvidia');
Note
If there is a connection failure, a diagnostic error message is reported on the MATLAB Command Window. The most likely cause of a failed connection is incorrect IP address or host name.
Configure the PIL Execution
Create a GPU code configuration object for generating a library and configure the object
for PIL. Use the coder.hardware
function to create a configuration object for the NVIDIA DRIVE or Jetson platform and assign it to the Hardware
property of the code
configuration object cfg
. Use 'NVIDIA Jetson'
for the
Jetson boards and 'NVIDIA Drive'
for the DRIVE boards.
cfg = coder.gpuConfig('lib','ecoder',true); cfg.GpuConfig.CompilerFlags = '--fmad=false'; cfg.VerificationMode = 'PIL'; cfg.GenerateReport = true; cfg.Hardware = coder.hardware('NVIDIA Jetson');
The --fmad=false
flag when passed to nvcc
,
instructs the compiler to disable Floating-Point Multiply-Add (FMAD) optimization. This
option is set to prevent numerical mismatch in the generated code because of architectural
differences in the CPU and the GPU. For more information, see Numerical Differences Between CPU and GPU (GPU Coder).
Generate Code and Run PIL Execution
To generate CUDA library and the PIL interface, use the codegen
command and pass the GPU code configuration object along with the size
of the inputs for the mandelbrot_count
entry-point function. The
-test
option runs the MATLAB test file, mandelbrot_test
. The test file uses
mandelbrot_count_pil
, the generated PIL interface for
mandelbrot_count
.
codegen -config cfg -args {0,zeros(1000),zeros(1000)} mandelbrot_count -test mandelbrot_test
### Connectivity configuration for function 'mandelbrot_count': 'NVIDIA Jetson' Code generation successful: View report Running test file: 'mandelbrot_test' with MEX function 'mandelbrot_count_pil.mexa64'. ### Starting application: 'codegen/lib/mandelbrot_count/pil/mandelbrot_count.elf' To terminate execution: clear mandelbrot_count_pil ### Launching application mandelbrot_count.elf...
The software creates the following output folders:
codegen\lib\mandelbrot_count
— Standalone code formandelbrot_count
.codegen\lib\mandelbrot_count\pil
— PIL interface code formandelbrot_count
.
Verify that the output of this run matches the output from the original
mandelbrot_count.m
function.
Note
On a Microsoft® Windows® system, the Windows Firewall can potentially block a PIL execution. Change the Windows Firewall settings to allow access.
Terminate the PIL Execution Process.
To terminate the PIL execution process.
clear mandelbrot_count_pil;
See Also
Functions
jetson
|drive
|webcam
|getPILPort
|getPILTimeout
|setPILPort
|setPILTimeout
Objects
Related Examples
- Sobel Edge Detection on NVIDIA Jetson Nano Using Raspberry Pi Camera Module V2
- Processor-in-the-Loop Execution on NVIDIA Targets Using GPU Coder