Main Content

Train Shallow Networks on CPUs and GPUs

Parallel Computing Toolbox


This topic describes shallow networks. For deep learning, see instead Scale Up Deep Learning in Parallel, on GPUs, and in the Cloud.

Neural network training and simulation involves many parallel calculations. Multicore CPUs, graphical processing units (GPUs), and clusters of computers with multiple CPUs and GPUs can all take advantage of parallel calculations.

Together, Deep Learning Toolbox™ and Parallel Computing Toolbox™ enable the multiple CPU cores and GPUs of a single computer to speed up training and simulation of large problems.

The following is a standard single-threaded training and simulation session. (While the benefits of parallelism are most visible for large problems, this example uses a small dataset that ships with Deep Learning Toolbox.)

[x, t] = bodyfat_dataset;
net1 = feedforwardnet(10);
net2 = train(net1, x, t);
y = net2(x);

Parallel CPU Workers

Intel® processors ship with as many as eight cores. Workstations with two processors can have as many as 16 cores, with even more possible in the future. Using multiple CPU cores in parallel can dramatically speed up calculations.

Start or get the current parallel pool and view the number of workers in the pool.

pool = gcp;

An error occurs if you do not have a license for Parallel Computing Toolbox.

When a parallel pool is open, set the train function’s 'useParallel' option to 'yes' to specify that training and simulation be performed across the pool.

net2 = train(net1,x,t,'useParallel','yes');
y = net2(x,'useParallel','yes');

GPU Computing

GPUs can have thousands of cores on a single card and are highly efficient on parallel algorithms like neural networks.

Use gpuDeviceCount to check whether a supported GPU card is available in your system. Use the function gpuDevice to review the currently selected GPU information or to select a different GPU.

gpuDevice(2) % Select device 2, if available

An “Undefined function or variable” error appears if you do not have a license for Parallel Computing Toolbox.

When you have selected the GPU device, set the train or sim function’s 'useGPU' option to 'yes' to perform training and simulation on it.

net2 = train(net1,x,t,'useGPU','yes');
y = net2(x,'useGPU','yes');

Multiple GPU/CPU Computing

You can use multiple GPUs for higher levels of parallelism.

After opening a parallel pool, set both 'useParallel' and 'useGPU' to 'yes' to harness all the GPUs and CPU cores on a single computer. Each worker associated with a unique GPU uses that GPU. The rest of the workers perform calculations on their CPU core.

net2 = train(net1,x,t,'useParallel','yes','useGPU','yes');
y = net2(x,'useParallel','yes','useGPU','yes');

For some problems, using GPUs and CPUs together can result in the highest computing speed. For other problems, the CPUs might not keep up with the GPUs, and so using only GPUs is faster. Set 'useGPU' to 'only', to restrict the parallel computing to workers with unique GPUs.

net2 = train(net1,x,t,'useParallel','yes','useGPU','only');
y = net2(x,'useParallel','yes','useGPU','only');

Cluster Computing with MATLAB Parallel Server

MATLAB® Parallel Server™ allows you to harness all the CPUs and GPUs on a network cluster of computers. To take advantage of a cluster, open a parallel pool with a cluster profile. Use the MATLAB Home tab Environment area Parallel menu to manage and select profiles.

After opening a parallel pool, train the network by calling train with the 'useParallel' and 'useGPU' options.

net2 = train(net1,x,t,'useParallel','yes');
y = net2(x,'useParallel','yes');

net2 = train(net1,x,t,'useParallel','yes','useGPU','only');
y = net2(x,'useParallel','yes','useGPU','only');

Load Balancing, Large Problems, and Beyond

For more information on parallel computing with Deep Learning Toolbox, see Shallow Neural Networks with Parallel and GPU Computing, which introduces other topics, such as how to manually distribute data sets across CPU and GPU workers to best take advantage of differences in machine speed and memory.

Distributing data manually also allows worker data to load sequentially, so that data sets are limited in size only by the total RAM of a cluster instead of the RAM of a single computer. This lets you apply neural networks to very large problems.