Perform Multicore Analysis for Dataflow

When a system is configured to use a dataflow execution domain, the Multicore tab is activated on the Simulink^® toolstrip. This tab consolidates multicore analysis techniques leveraged in dataflow into an incremental and iterative workflow.

Using the controls on the Multicore tab, you can:

Estimate the relative cost of blocks using internal Simulink heuristics.
Profile dataflow multicore simulations.
Measure average execution times (cost) of blocks inside the dataflow subsystems by simulating the model with software-in-the-loop (SIL) or processor-in-the-loop (PIL) profiling. This functionality requires an Embedded Coder^® license.
Manually override the block cost values.
Provide analysis constraints, such as maximum number of threads and threading threshold.
Run analysis to generate a block-to-threads allocation and visualize analysis results.

Multicore tab

This chart illustrates the steps of multicore analysis. After you specify the dataflow execution domain for a system, you can select a cost calculation method, overwrite block costs, specify analysis constraints, run analysis, and review results.

Multicore analysis workflow

Select the Cost Calculation Method

On the Multicore tab, in the Mode section, you can select the method of cost calculation as Simulation Profiling, Cost Estimation, or SIL/PIL Profiling. The cost of individual blocks will be automatically determined and used in the multicore analysis for equal distribution of the computational load across multiple CPU cores.

Cost estimation in the toolstrip

Simulation Profiling

Use Simulation Profiling to:

Profile dataflow multicore simulations.
Display simulation multicore analysis data including cost data, latency suggestions, number of threads, thread highlighting, and pipeline delays annotations.

When Simulation Profiling is selected, the Profile Simulation button is disabled and the Run Analysis button is enabled.

Simulation profile button is disabled when the Simulation Profiling option is selected and the Run Analysis button is enabled.

When you perform simulation profiling, use the Optimize button to optimize settings for simulation performance. Button is enabled for only Simulation Profiling option.

Optimize button

Cost Estimation

Use Cost Estimation for:

Quick analysis without running the simulation or generating code.
Preliminary analysis when the model is not fully implemented. In this case, you can modify the results of the estimation to match the anticipated cost values for the final implementation.

When you click Estimate Cost, the Cost Editor displays the estimated execution cost of each block in your model without simulating it.

Cost Editor window

SIL/PIL Profiling

Use the software-in-the-loop (SIL) or processor-in-the-loop (PIL) profiling method (requires Embedded Coder license) to:

Acquire accurate cost values measured on the host computer using the generated code. The generated code is the closest to the code that will be deployed on the hardware.
Measure cost values on the actual target hardware in order to maximize the utilization of cores when the final code is deployed.

SIL/PIL profiling measures average execution times (cost) of blocks inside the dataflow system by simulating the model with SIL/PIL.

SIL and PIL profiling settings in the toolstrip

Use Settings to configure C/C++ code generation and hardware implementation settings.
Use Stop Time to specify the time to measure the cost.
Use the list to select the Software-in-the-Loop (SIL) orProcessor-in-the-Loop (PIL) setting.
Use Profile to measure the costs associated with blocks with the specified settings.

This example shows the highlighted block in the model and its cost. Observe that Cost Editor displays the units of the profiled cost values when you perform SIL/PIL profiling.

Results from SIL/PIL profiling

Manually Change Block Costs

In Cost Estimation and SIL/PIL Profiling modes, you can manually change the block cost values to understand their impact to the multicore behavior. To override block costs, clear the Auto column for the corresponding block and edit the value in the Cost column.

Overwriting block costs values allows you to perform analysis for custom costs.

Cost column

The costs are not editable in Simulation Profiling mode.

Specify Analysis Constraints and Run Analysis

Next, set constraints and run multicore analysis. In the Analyze section:

Number of cores and multithreading threshold

Use Maximum Number of Threads to specify the maximum number of threads produced by the analysis. By default, the tool automatically tries to determine the number of cores of the target processor from the hardware settings and uses that as maximum number of threads. If the tool is unable to determine the exact value, it will use the number of cores on the host platform as the maximum number of threads.
Specify the Multithreading Threshold to set a minimum for the total cost (in microseconds) of the system, for which the tool applies multithreading. If the total cost falls below the threshold, the tool will not partition the system. By default, the tool uses a nominal value, 25 micro- seconds, as the threshold.
Click Run Analysis to perform the analysis based on your configuration.

Review Results

Use the tools provided in the Review Results section to visualize and understand the multicore behavior of your model.

Tools to review results

Highlight and View Threads

Select Highlight threads to highlight and visualize the threads and the assignment of blocks to the threads based on the block execution cost values.

Thread highlighting

Select Thread Viewer to visualize the allocation of blocks to threads.

Allocation of the blocks to threads

Dataflow Analysis Report

Analyze the Suggestions for Increasing Concurrency section to see if there are suggested latencies for pipelining delays. By pipelining the data-dependent blocks, the Dataflow Subsystem block can increase concurrency for higher data throughput. For more information about pipelining delays, see Multicore Simulation and Code Generation of Dataflow Domains. The speedup analysis is not supported in Simulation Profiling mode.

Suggested latency

After accepting suggested latencies for pipelining delays, you can use Show pipeline delays to visualize the delays in your model.

Pipeline delays in the model

Use the analysis report to investigate the relative weight of dataflow systems and the maximum theoretical speedup for the entire model. This speedup can be achieved as a result of the partitioning performed during the analysis. The amount of speedup is proportional to the relative weight of dataflow systems with respect to the entire model.

The analysis report displays total cost and number of threads values for each Dataflow Subsystem block.

Theoretical speedup chart

The speedup is calculated using this formula, where n is the total number of Dataflow Subsystem blocks, pctPar is the percentage of the parallel execution of a subsystem, and criticalPathCost is the cost of the most costly thread in a subsystem.

$S p e e d u p \leq \frac{1}{(1 - \sum_{i = 0}^{n} p c t P a r_{i}) + \sum_{i = 0}^{n} \frac{p c t P a r_{i} \times c r i t i c a l P a t h C o s t_{i}}{t o t a l C o s t I n S u b s y s t e m_{i}}}$

The Dataflow Analysis Report also displays multithreading exceptions as a table when applicable.

Table displaying all possible dataflow exceptions

The Multithreading Exceptions table displays three types of messages:

No multithreading in any subsystems due to model configuration — This message displays when a model configuration causes all multicore dataflow to be disabled, for example, when multithreading is not supported for protected models.
Subsystems not multithreading — This message displays when a subsystem is executed using a single thread due to an unsupported feature or a block or insufficient work. For example, the Scope block is not supported for multithreading. The message also displays subsystems with insufficient work for multithreading. In this scenario, the message displays the computational load in microseconds.
Subsystems with performance limitations — This message displays when a block is significantly more computationally intensive than other blocks and limits the throughput. This message indicates that the subsystem can not execute faster than the specified block. If a block is more than 90% of the computational load, this causes subsystem to be executed using a single thread and the message displays the subsystems that are not multithreading.