Main Content

Reduction Operations Supported for Automatic Parallelization of for-loops

The code generator automatically parallelizes for-loops by converting implicit and explicit sequential for-loop code blocks into parallelized code blocks. Parallelization of a section of code might significantly improve the execution speed of the generated code. See How parfor-Loops Improve Execution Speed.

Parallelize for-loops Performing Reduction Operations

You can parallelize for-loops performing reduction operations by using the configuration option Optimize reductions.

To enable automatic parallelization of these for-loops:

  1. Open the MATLAB® Coder™ app.

  2. On the Generate Code page, click More Settings.

  3. On the Speed tab, select the Enable automatic parallelization and Optimize reductions check boxes.

Optimize reductions is also enabled if you set the Leverage target hardware instruction set extensions parameter to an instruction set that your processor supports.

To enable the configuration option OptimizeReductions by using the command-line interface, run these commands.

cfg = coder.config('lib');
cfg.EnableAutoParallelization = true;
cfg.OptimizeReductions = true;

For example, write a MATLAB function arraySum that calculates the sum of elements of arrays in1 and sum, and returns the reduction variable out.

function out = arraySum(in1,a,b)
sum = 0;
c = zeros(numel(in1),1);
for i2 = 1:numel(in1)
    if i2 > in1(i2)
        sum = sum + in1(i2);
        c(i2) = a(i2) + b(i2);
out = sum + mean(c);

At the MATLAB command line, run this codegen command.

arr = 1:1000;
codegen arraySum -config cfg -args {arr,arr,arr} -report
Code generation successful: View report

Open the code generation report to see the parallelized for-loop that performs the addition operation.

sum = 0.0;
#pragma omp parallel num_threads(omp_get_max_threads()) private(sumPrime, d)
    sumPrime = 0.0;
    #pragma omp for nowait
    for (i2 = 0; i2 < 1000; i2++) {
      c[i2] = 0.0;
      d = in1[i2];
      if ((double)i2 + 1.0 > d) {
        sumPrime += d;
        c[i2] = a[i2] + b[i2];

      sum += sumPrime;

MATLAB Functions Supported for Reduction Operations

A reduction operation reduces specific dimensions of an input to a scalar value. A reduction operation must be associative and commutative. This table lists the MATLAB functions that are supported as reduction operations and are parallelized in generated code, where X is the reduction variable and expr is a MATLAB expression. The reduction variable X can appear on both sides of an assignment statement.

MATLAB FunctionUsage Notes
  • For integer data types, the Saturate on integer overflow (SaturateOnIntegerOverflow) property must be disabled.

  • Example: X = X + expr

  • For integer data types, the Saturate on integer overflow (SaturateOnIntegerOverflow) property must be disabled.

  • Example: X = X - expr

  • For integer data types, the Saturate on integer overflow (SaturateOnIntegerOverflow) property must be disabled.

  • Example: X = X .* expr

  • Example: X = max(X,expr)

  • Example: X = min(X,expr)

  • For integer data types, the Saturate on integer overflow (SaturateOnIntegerOverflow) property must be disabled.

  • Example: X = sum(X)

  • For integer data types, the Saturate on integer overflow (SaturateOnIntegerOverflow) property must be disabled.

  • Example: X = prod(X)

  • Example: X = X | expr

  • Example: X = X & expr

  • Example: X = bitand(X,expr)

  • Example: X = bitor(X,expr)

  • Example: X = bitxor(X,expr)


The Support nonfinite numbers (SupportNonFinite) property supports code generation only for standalone libraries (lib, dll) and executables.

The following example shows a typical usage of a reduction variable X.

X = 0;            % Initialize X
for i = 1:n
    X = X + d(i);

This loop is equivalent to the following, where you calculate each d(i) in a different iteration.

X = X + d(1) + ... + d(n)

Handling Overflow in Automatic Parallelization of for-loops

Enabling automatic parallelization of for-loops and reduction optimization might produce different results due to overflow when you compare the output of sequential MATLAB code with that of the generated parallel C/C++ code. Therefore, when there is possibility of such overflow, the code generator does not parallelize the loop.

The table shows the MATLAB functions where significant overflow can occur, along with their corresponding workarounds.

MATLAB FunctionDescriptionWorkaround

Integer overflow

function out = integerOverflow(in)
    out = int8(0);
    for i = 1:numel(in)
        out = out + in(i);
ans =



Automatic parallelization of reduction based for-loops performing arithmetic operations on integers is not supported when SaturateOnIntegerOverflow parameter is enabled.

During parallel execution, the reduction operations are distributed among multiple threads. When all the partial results are accumulated at the end, the results might be non-deterministic. Therefore, the code generator do not automatically parallelize the for-loop. For example,

(126-125) + 122 = 1 + 122 = 123

(126 + 122) - 125 = 127(saturation) - 125 = 2

If appropriate for your application, disable the Saturate on integer overflow (SaturateOnIntegerOverflow) property to automatically parallelize for-loops.

Usage Notes and Limitations

  • for-loops containing calls to C/C++ functions using coder.ceval are not automatically parallelized.

  • Bitwise reduction operations (bitand, bitor, and bitxor) are only supported for integer data types.

  • Custom reduction operations such as a = foo(a,b) are not supported for automatic parallelization of for-loops.

  • For MEX targets, the ResponsivenessChecks configuration parameter must be set to false to automatically parallelize for-loops. Consider disabling ResponsivenessChecks only if you are sure that you will not need to stop execution of your application using Ctrl+C. You can disable ResponsivenessChecks by using these commands. For more information see, Control Run-Time Checks.

    mexcfg = coder.config('mex');
    mexcfg.ResponsivenessChecks = false;

  • Reduction operations on floating-point numbers are only approximately associative. To get deterministic behavior of a parallel execution, the reduction operations involved must be associative. To be associative, a function f must satisfy the following for all a, b, and c.

    f(a,f(b,c)) = f(f(a,b),c)
    When working with floating-point numbers, different parallel executions of a loop might produce results with different round-off errors. If such round-off errors are unacceptable to your application, use the pragma coder.loop.parallelize('never') to instruct the code generator to not automatically parallelize specific for-loops. For more information on potential differences during code generation, see Differences Between Generated Code and MATLAB Code.

Related Topics