Use Experiment Manager to Train Networks in Parallel
By default, Experiment Manager runs one trial of your experiment at a time on a single CPU. If you have Parallel Computing Toolbox™, you can configure your experiment to run multiple trials at the same time or to run a single trial at a time on multiple GPUs, on a cluster, or in the cloud.
|Run multiple trials at the same time using one parallel worker for each trial.|
Set up your parallel environment and enable the Use Parallel option before running your experiment. Experiment Manager runs as many simultaneous trials as there are workers in your parallel pool.
If you have multiple GPUs, parallel execution typically increases the speed of your experiment. However, if you have a single GPU, all workers share that GPU, so you do not get the training speed-up and you increase the chances of the GPU running out of memory.
|Run a single trial at a time on multiple parallel workers.|
Built-In Training Experiments:
Custom Training Experiments:
To run an experiment in parallel using MATLAB Online, you must have access to a Cloud Center cluster. For more information, see Use Parallel Computing Toolbox with Cloud Center Cluster in MATLAB Online (Parallel Computing Toolbox).
Set Up Parallel Environment
Train on Multiple GPUs
If you have multiple GPUs, parallel execution typically increases the speed of your experiment. Using a GPU for deep learning requires Parallel Computing Toolbox and a supported GPU device. For more information, see GPU Support by Release (Parallel Computing Toolbox).
For built-in training experiments, GPU support is automatic. By default, these experiments use a GPU if one is available.
For custom training experiments, computations occur on a CPU by default. To train on a GPU, convert your data to
gpuArrayobjects. To determine whether a usable GPU is available, call the
For best results, before you run your experiment, create a parallel pool with as many
workers as GPUs. You can check the number of available GPUs by using the
gpuDeviceCount (Parallel Computing Toolbox) function.
numGPUs = gpuDeviceCount("available"); parpool(numGPUs)
Train on Cluster or in Cloud
If your experiments take a long time to run on your local machine, you can accelerate training by using a computer cluster on your onsite network or by renting high-performance GPUs in the cloud. After you complete the initial setup, you can run your experiments with minimal changes to your code. Working on a cluster or in the cloud requires MATLAB® Parallel Server™. For more information, see Deep Learning in the Cloud.
Run Multiple Trials in Parallel
To run multiple trials of your experiment in parallel, on the Experiment Manager toolstrip, click Use Parallel and then Run . If there is no current parallel pool, Experiment Manager starts one using the default cluster profile.
Experiment Manager runs as many simultaneous trials as there are workers in your parallel pool. All other trials in your experiment are queued for later evaluation. A table of results displays the status and progress of each trial.
While the experiment is running, you can track its progress by displaying the training plot for each trial. You can also stop trials that appear to be underperforming. For more information, see Stop and Restart Training.
Experiment Manager does not support the execution of multiple trials in parallel when
you set the training option
"parallel" or when you enable the
DispatchInBackground. Use these options to speed up
your training only if you intend to run one trial of your experiment at a time. For more
information, see Scale Up Deep Learning in Parallel, on GPUs, and in the Cloud and Use Datastore for Parallel Training and Background Dispatching.
gpuDeviceCount(Parallel Computing Toolbox) |
parpool(Parallel Computing Toolbox) |
spmd(Parallel Computing Toolbox)
gpuArray(Parallel Computing Toolbox)