## Quick Start Parallel Computing in MATLAB

You can use parallel computing to carry out many calculations simultaneously. Split large problems into smaller ones, which you can process at the same time.

With parallel computing, you can:

Save time by distributing tasks and executing them simultaneously

Solve big data problems by partitioning data

Take advantage of your desktop computer resources and scale up to clusters and cloud computing

This table lists some essential parallel computing terms and their definitions.

Term | Definition |
---|---|

Thread | Smallest set of instructions that a CPU can schedule and
execute independently. A GPU, multiprocessor, or multicore computer
can perform |

Process | Execution of an instance of a computer program by one or many threads. Each process has its own blocks of memory. |

Node | Standalone computer containing one or more CPUs or GPUs. Nodes can be networked to form a cluster or supercomputer. |

Cluster | Collection of interconnected computers that work together as a unified system to provide high-performance computing power for processing complex and data-intensive tasks. |

Scalability | Increase in parallel speedup with the addition of more resources. |

### Prerequisites

To run the examples on this page, you must have a Parallel Computing Toolbox™ license. To determine whether you have Parallel Computing Toolbox installed, and whether your machine can create a default parallel
pool, enter this code in the MATLAB^{®} Command Window.

if canUseParallelPool disp("Parallel Computing Toolbox is installed") else disp("Parallel Computing Toolbox is not installed") end

Alternatively, to see which MathWorks^{®} products you have installed, in the Command Window, enter
`ver`

.

### Accelerate MATLAB Code

Before you parallelize your code, you can use techniques such as vectorization and preallocation to improve the sequential performance of your MATLAB code. Sequential acceleration and parallelization can often work together to give cumulative performance improvements.

#### Vectorization

MATLAB is optimized for operations involving matrices and vectors. The process of revising loop-based, scalar-oriented code to use MATLAB matrix and vector operations is called vectorization. Using vectorized code instead of loop-based operations often improves your code performance.

These code snippets compare the amount of time the software needs to calculate the square root of 1,000,000 values with loop-based code against vectorized code.

Without Vectorization | With Vectorization |
---|---|

tic for k = 1:1000000 x(k) = sqrt(k); end toc Elapsed time is 0.112298 seconds. |
tic k = 1:1000000; x = sqrt(k); toc Elapsed time is 0.006783 seconds. |

#### Preallocation

In some cases, `while`

- and `for`

-loops that incrementally
increase the size of an array each time through the loop can adversely affect
performance and memory use. You can preallocate the maximum amount of space
required for an array instead of continuously resizing arrays when you run
loop-based code.

These code snippets compare the amount of time the software needs to create a
scalar variable `x`

, when you gradually increase the size of
`x`

in a `for`

-loop against when you
preallocate a 1-by-1,000,000 block of memory for
`x`

.

Without Preallocation | With Preallocation |
---|---|

tic x = 0; for k = 2:1000000 x(k) = x(k-1) + 5; end toc Elapsed time is 0.103415 seconds. |
tic x = zeros(1,1000000); for k = 2:1000000 x(k) = x(k-1) + 5; end toc Elapsed time is 0.018758 seconds. |

This table shows the appropriate preallocation function for the type of array you want to initialize.

Array Type to Initialize | Preallocation Function |
---|---|

Numeric | `zeros` |

String | `strings` |

Cell | `cell` |

Table | `table` |

### Run MATLAB on Multicore and Multiprocessor Nodes

MATLAB supports two ways to parallelize your code on multicore and multiprocessor nodes.

#### Implicit Parallelization with Built-in Multithreading

Some MATLAB functions implicitly use multithreading to parallelize their
execution. These functions automatically execute on multiple computational
threads in a single MATLAB session, which means they run faster on multicore-enabled
machines. Some examples are linear algebra and numerical functions such as
`fft`

, `mldivide`

, `eig`

, `svd`

, and `sort`

. Therefore, if you use
these functions on a machine with many cores, you can observe an increase in
performance.

#### Explicit Parallelization with MATLAB Workers

MATLAB and Parallel Computing Toolbox software uses MATLAB workers to explicitly parallelize your code. MATLAB workers are MATLAB computational engines that run in the background without a graphical desktop. The MATLAB session you interact with, also called the MATLAB client, instructs the workers with parallel language functions. You use Parallel Computing Toolbox functions to automatically divide tasks and assign them to these workers to execute the computations in parallel.

### Set Up Environment for Explicit Parallelization

If you have Parallel Computing Toolbox installed on your machine, you can start an interactive parallel pool of workers to take advantage of the cores in your multicore computer.

A parallel pool (* parpool*) is a group
of MATLAB workers on which you can interactively run code.

You can create a parallel pool of workers using `parpool`

or functions with automatic
parallel support. By default, parallel language functions such as `parfor`

, `parfeval`

, and `spmd`

automatically create a
parallel pool when you need one. When the workers start, your MATLAB session connects to them. For example, this code automatically starts
a parallel pool and runs the statement in the `parfor`

-loop in
parallel on six
workers.

parfor i = 1:100 c(i) = max(eig(rand(1000))); end

Starting parallel pool (parpool) using the 'Processes' profile ... Connected to parallel pool with 6 workers.

You can also use the parallel status indicator in the lower left corner of
MATLAB desktop to start a parallel pool manually. Click the indicator icon,
and then select **Start Parallel Pool**.

To stop a parallel pool while it is starting, press
**Ctrl+C** or **Ctrl+Break**. On Apple
macOS operating systems, you also can use
**command****+** (the **command** key and the plus key).

Starting a parallel pool often takes a long time, which can impact performance for code that takes only a few seconds to execute. For longer running code, the overhead becomes less significant.

Your default parallel environment determines the parallel pool cluster. The
default parallel environment of your local machine is called
`Processes`

. This environment starts a parallel pool of process
workers. You can see the selection of available cluster profiles in the
**Parallel** menu on the MATLAB
**Home** tab.

**Note**

For the default `Processes`

profile, the default number of
process workers is one per physical CPU core using a single computational
thread. This restriction ensures that each worker has exclusive access to a
floating-point unit, and generally optimizes performance of computational code.
If your code is not computationally intensive, for example, code that is
input/output (I/O) intensive, then consider using up to two workers per physical
core. Running too many workers on too few resources can impact the performance
and stability of your machine.

This table summarizes the different ways you can create interactive parallel pools.

Parallel Environment | Worker Type | Location | Number of Available Cores or Threads |
---|---|---|---|

`Processes` | Process | Local machine | Up to 512 cores |

`Threads` | Thread | Local machine | Up to 512 threads |

`backgroundPool` | Thread | Local machine | Without a Parallel Computing Toolbox license: 1 thread |

With a Parallel Computing Toolbox license: Up to the number of threads that the
| |||

`Cluster` | Process | Onsite or cloud cluster | Up to the maximum number of workers the cluster can start |

Parallel Computing Toolbox also supports running a parallel pool of workers that are backed by
computing threads instead of process workers. This parallel environment is called
`Threads`

. Thread workers have reduced memory usage, faster
scheduling, and lower data transfer costs. However thread workers support only a
subset of the MATLAB functions that are available to process workers.

MATLAB also supports an additional local parallel environment called `backgroundPool`

. The `backgroundPool`

environment is
backed by thread workers and supports running code in the background while you run
other code in your session at the same time. You can use one thread worker in the
`backgroundPool`

environment when you do not have a Parallel Computing Toolbox license. If you have a Parallel Computing Toolbox license, the maximum number of thread workers in your
`backgroundPool`

is the value that the `maxNumCompThreads`

function
returns.

If you have access to onsite or cloud clusters, you can discover other clusters
running on your network or on Cloud Center by clicking **Parallel** > **Discover Clusters** and following the prompts. Parallel pools on clusters are backed by
process workers and support the full parallel language.

When you have an interactive parallel pool of workers, you can use parallel
language functions to split large problems into smaller tasks that workers can
execute in parallel. To accelerate your MATLAB code, use interactive parallel features such as `parfor`

.

### Run Explicit Parallelization with `parfor`

-loop

This example shows how to convert a `for`

-loop into a `parfor`

-loop and calculate the scalability of the `parfor`

-loop with the number of workers.

You can convert `for`

-loops to run in parallel by using a `parfor`

-loop. Often, you can simply replace `for`

with `parfor`

. However, you often need to adjust your code further to run in it parallel.

**Mechanics of parfor-loops**

When you run a `parfor`

-loop, MATLAB executes the statements in the loop body in parallel. Each execution of the `parfor`

-loop body is an *iteration*. The MATLAB client issues the `parfor`

command and coordinates with the workers to execute the loop iterations in parallel on the workers in a parallel pool. A `parfor`

-loop can provide significantly better performance than its analogous `for`

-loop because several workers compute iterations simultaneously.

When you run a `parfor`

-loop, the MATLAB client divides the loop iterations into subranges and assigns them to the workers. If the number of workers is equal to the number of loop iterations, each worker performs one iteration of the loop. If the number of iterations is greater than the number of workers, some workers perform more than one loop iteration. In this case, a worker receives multiple iterations at once to reduce communication time. The client also performs a static analysis of the `parfor`

-loop code to determine which data to transfer to each worker and which data to transfer back to the client. The client sends the necessary data to the workers, which execute most of the computation. The workers then send the results back to the client, which assembles those results. MATLAB workers evaluate iterations in no particular order and independently of each other. Because each iteration is independent, the iterations need not be synchronized, and often are not.

A `parfor`

-loop must satisfy these basic requirements.

Loop iterations are independent. When you convert your

`for`

-loop into a`parfor`

-loop, you must ensure that the loop iterations are independent. If your`parfor`

code has dependence between the loop iterations, the Code Analyzer in the MATLAB Editor detects the dependence. Executing the`parfor`

-loop generates an error.

Loop execution are not in order. Because

`parfor`

-loop iterations have no guaranteed order, you must ensure that your code that uses a`parfor`

-loop does not rely on the output of the`parfor`

-loop being in order.

**Convert for-loops to parfor-loops**

Convert a `for`

-loop into a `parfor`

-loop in code that calculates the maximum value of the singular-value decomposition of 5000 200-by-200 random matrices by replacing `for`

with `parfor`

. Execute the `parfor`

-loop on six workers. Compare their execution times.

When you use `parfor`

and you have Parallel Computing Toolbox software installed, MATLAB automatically starts a parallel pool of workers. The parallel pool can take a long time to start. This example shows a second run with the pool already started. You can observe that the `parfor`

code executed on six workers runs much faster than the `for`

-loop code`.`

tic y = zeros(5000,1); for n = 1:5000 y(n) = max(svd(randn(200))); end toc

Elapsed time is 21.837346 seconds.

tic y = zeros(5000,1); parfor n = 1:5000 y(n) = max(svd(randn(200))); end toc

Elapsed time is 3.908282 seconds.

If the speed-up is less than you expect, you can calculate the scalability of your `parfor`

-loop code.

**Calculate Scalability**

You can calculate the scalability of converting this `for`

-loop into a `parfor`

-loop. Use the scalability to determine whether your `parfor`

-loop code scales well with the number of workers, and whether a limit exists.

Use a `for`

-loop to iterate through different numbers of workers to run the `parfor`

-loop. To specify the number of workers, use the second input argument of `parfor`

. You can modify the values in the `NumWorkers`

array to match your available resources.

numIterations = 5000; numWorkers = [1 2 3 4 5 6]; t = zeros(size(numWorkers)); for w = 1:numel(numWorkers) tic; y = zeros(numIterations,1); parfor (n = 1:numIterations,numWorkers(w)) y(n) = max(svd(randn(200))); end t(w) = toc; end

Calculate the speedup by computing the ratio between the computation time of a single worker and the computation time of each maximum number of workers. To calculate the efficiency of parallelizing the tasks, divide the ideal speedup by the calculated speedup.

speedup = t(1)./t; efficiency = (speedup./numWorkers).*100;

To visualize how the computations scale up with the number of workers, plot the speedup and efficiency against the number of workers with the `comparePlot`

function defined at the end of the example.

The speedup increases as the number of workers increases. Adding more workers shows a reduction in computation time, but the scaling is not perfect because the efficiency decreases as the number of workers increases. This is due to the overhead associated with parallelization. Parallel overhead includes the time the software needs for communication, coordination, and data transfer from the client to the workers and back.

`parfor`

-loops that do not have many iterations or computationally demanding tasks generally do not scale well with an increasing number of workers because the time the software needs for data transfer is significant compared with the time the software needs for computation.

comparePlot(numWorkers,speedup,efficiency);

After you finish your computation, you can delete the current parallel pool. Get the current parallel pool with the `gcp`

function.

delete(gcp)

Parallel pool using the 'Processes' profile is shutting down.

**Helper Functions**

This function plots the speedup and efficiency of the `parfor`

-loop against the number of workers.

function comparePlot(numWorkers,speedup,efficiency) yyaxis left plot(numWorkers,speedup,'-*') grid on title('Speedup and Efficiency with Number of Workers'); xlabel('Number of Workers'); xticks(numWorkers); ylabel('Speedup'); yyaxis right plot(numWorkers,efficiency,'--o'); ylabel('Efficiency') xticks(numWorkers); ylabel('Efficiency (%)'); legend('Speedup','Efficiency') end

### Discover Other Parallel Language Functions

You can perform these tasks by using Parallel Computing Toolbox with other parallel language functions.

Perform asynchronous processing with

`parfeval`

.Speed up your calculation on the supported GPUs of your computer by using

`gpuArray`

.Scale up your computation using big data processing tools, such as

`distributed`

and`tall`

, with parallel pools.Offload your calculation to computer clusters or cloud computing facilities using

`batch`

.Run Simulink

^{®}models in parallel with`parsim`

(Simulink) and`batchsim`

(Simulink).Offload your calculation to a cluster onsite or in the cloud using MATLAB Parallel Server™ software. For more information, see Clusters and Clouds.

Several MathWorks products now offer built-in support for parallel computing products without requiring extra coding. For the current list of these products and their parallel functionality, see Parallel Computing Support in MATLAB and Simulink Products.

For more information about the parallel language functions and their applications, see Choose a Parallel Computing Solution and Parallel Language Decision Tables.

## See Also

`for`

| `parfor`

| `parfeval`

| `gpuArray`

| `distributed`

| `tall`

| `datastore`

| `mapreduce`

| `batch`

| `parsim`

(Simulink) | `batchsim`

(Simulink)

## Related Topics

- Vectorization
- Preallocation
- Choose a Parallel Computing Solution
- Parallel Language Decision Tables
- Run Code on Parallel Pools
- Run MATLAB Functions with Automatic Parallel Support
- Decide When to Use parfor
- Evaluate Functions in the Background Using parfeval
- Identify and Select a GPU Device
- Distributing Arrays to Parallel Workers
- Run Single Programs on Multiple Data Sets
- Run Batch Parallel Jobs