Speed and Area Optimizations in HDL Coder
Use area and speed optimizations in HDL Coder™ to save resources and improve the timing of your design on the target FPGA device. The optimizations do not change the functional behavior of your algorithm but can optimize certain resources in your design, introduce latency, or cause difference in sample rates.
You can initially generate HDL code and synthesize your design on your FPGA platform without enabling optimizations. If the design does not meet the timing requirements, you can enable the optimizations and rerun the workflow until your design meets the area and speed requirements. See Basic HDL Code Generation Workflow.
Optimizations in MATLAB HDL Code Generation
To enable optimizations on your MATLAB® code, open the Workflow Advisor from MATLAB. In the Advisor, on the HDL Code Generation task, enable the settings in the Optimization tab.
Optimizations in Simulink HDL Code Generation
You can enable optimizations at the model level and at the block level. Specify model-level optimizations:
In the Configuration Parameters dialog box, on the HDL Code Generation > Optimization pane. See HDL Code Generation Pane: Optimization.
In the Simulink® HDL Workflow Advisor, on the Set HDL Optionstask, the HDL Code Generation Settings button opens the HDL code generation settings in the Configuration Parameters dialog box. You can then navigate to the HDL Code Generation > Optimization pane.
Subsystems in your model inherit the model-level optimization settings. You can
change the subsystem level settings in the HDL Block Properties dialog box for the
subsystems or by using the
hdlset_param function. You can also
specify certain additional settings for certain blocks in your model such as adding
pipelines at the input and output. This table illustrates various optimizations that
are available at the block level and model level.
|Optimization||Model Level?||Subsystem Level?||Comments|
|Clock rate pipelining||Yes||Yes||–|
|Distributed pipelining||Yes||Yes||At the model level, you use hierarchical distributed pipelining. To apply the optimization across subsystem hierarchies, enable distributed pipelining at each subsystem level.|
|Resource sharing||Yes||Yes||At the model level, you specify the type of resources you want to share such as adders and multipliers. At the block level, you specify the SharingFactor.|
To see the effect of the optimizations:
You can generate an optimization report with the HDL code. To learn how to enable this report, see Create and Use Code Generation Reports.
Open the generated model or generate the validation model. The generated model is a behavioral model of the HDL code that shows the effect of block implementations and optimizations that you enabled. To verify the numerics of the generated model with the original model, you can generate a validation model. See Generated Model and Validation Model.
To effectively use optimizations, change the sample time setting for
Constant blocks from
Your model can have design delays and pipeline delays. Design delays are delays that you manually add to your model. Pipeline delays are delays that are introduced by pipelining settings specified on the blocks, block implementations such as Newton-Raphson method, native floating-point operators, or speed optimizations. You see these delays in the generated HDL code, generated model, and validation model.
General optimization parameters includes:
RAM mapping: use RAM mapping parameters to map large delays, persistent variables in MATLAB code, and pipeline delays to RAM based on a threshold bit width. See also RAM Mapping for MATLAB Code and RAM Mapping Parameters.
Delay balancing: Enabled by default, this optimization balances pipeline delays by inserting matching delays in parallel paths. The optimization matches numerics of the generated model with the original model. You see the effect of this optimization in the Delay Balancing section of the optimization report. See Delay Balancing.
Speed optimizations improve the timing of your design on the target FPGA by
optimizing the critical path. To identify the critical path, you can run the
Generic ASIC/FPGA workflow for your FPGA device and then
annotate the critical path or use the timing reports.
To identify the critical path more quickly and speed up the iterative process of finding and optimizing the critical path, use critical path estimation. You do not have to run synthesis or generate HDL code. Critical path estimation uses static timing analysis with timing data from target-specific timing databases. You see the effect of this optimization in the Critical Path Estimation section of the optimization report. See Critical Path Estimation Without Running Synthesis.
Speed optimizations include:
Clock rate pipelining: A Simulink optimization that is enabled by default, and runs pipeline registers at a faster clock rate when you specify an Oversampling factor greater than one. Use clock-rate pipelining with hierarchy flattening to remove hierarchical boundaries in a subsystem, thereby improving retiming. See Clock-Rate Pipelining.
Distributed pipelining: An optimization that retimes registers that are existing delays, or specified by using InputPipeline and OutputPipeline block settings. To preserve existing delays, enable the Preserve design delays setting. To more accurately reflect how components function on hardware to better distribute pipelines and increase clock speed for your target device, use synthesis timing estimates for distributed pipelining. See Use synthesis estimates for distributed pipelining. Enable hierarchical distributed pipelining on the model and distributed pipelining on the subsystems for retiming registers across hierarchies. You see the effect of this optimization in the Distributed Pipelining section of the optimization report. See Distributed Pipelining and Hierarchical Distributed Pipelining.
Adaptive pipelining: A Simulink optimization that inserts pipeline registers at input or output or both ports of certain blocks to create patterns that efficiently map blocks to DSP units on the target FPGA device. The optimization considers the target device, target frequency, multiplier word lengths, and the HDL Block Property settings. You see the effect of this optimization in the Adaptive Pipelining section of the optimization report. See Adaptive Pipelining.
Loop Unrolling: A MATLAB optimization that unrolls a loop by instantiating multiple instances of the loop body in the generated code. You can also partially unroll a loop. See Optimize MATLAB Loops
Area optimizations reduce resource usage of your design. Optimizing your design for area can reduce the speed at which your design runs on the FPGA.
Area optimizations include:
Resource Sharing: An optimization that identifies multiple functionally equivalent resources and replaces them with a single resource. At the model level, you specify resources you want to share such as adders and multipliers. At the subsystem level, you specify a SharingFactor depending on the number of shareable resources in your design. By using the optimization with clock-rate pipelining, you can specify how to overclock the shared resources. See Resource Sharing
Streaming: A Simulink optimization that splits a vector data path into multiple smaller vector data paths based on the StreamingFactor that you specify on the subsystems, thereby reducing hardware resource consumption. See Streaming.
Loop Streaming: A MATLAB optimization that streams a loop by instantiating the loop body once and using that instance for each loop iteration. The code generator oversamples the loop body instance to keep the generated loop functionally equivalent to the original loop. See Optimize MATLAB Loops