-maxNumCompThreads, hyperthreading, and parpool

111 views (last 30 days)
I'm running Matlab R2014a on a node in a Linux cluster that has 20 cores and hyperthreading enabled. I know this has been discussed before, but I'm looking for some updated information. Here's what my understanding is of the threads vs. cores issue in Matlab:
  • Matlab has inherent multithreading capabilities, and will utilize extra cores on a multicore machine.
  • Matlab runs its threads in such a way that putting multiple Matlab threads on the same core (i.e. hyperthreading) isn't useful. So by default, the maximum number of threads that Matlab will create is the number of cores on your system.
  • When using parpool(), regardless of the number of workers you create, each worker will use only one physical core, as mentioned in this thread .
However, I've also read that using the (deprecated) function maxNumCompThreads(), you can either decrease or increase the number of threads that Matlab or one of the workers will generate. This can be useful in several scenarios:
  1. You want to utilize Matlab's implicit multithreading capabilities to run some code on a cluster node without allocating the entire node. It would be nice if there was some other way to do this if maxNumCompThreads ever gets removed.
  2. You want to do a parameter sweep but have less parameters than the number of cores on your machine. In this case you might want to increase the number of threads per worker so that all of your cores are utilized. This was suggested recently in this thread. However, in my experience, while the individual workers seem quite happy to use maxNumCompThreads() to increase their thread count, inspecting the actual CPU usage using the "top" command suggests that it doesn't have any effect, i.e. each worker still only gets to use one core. It's possible that what is happening is that the individual Matlab processes spawned by the parpool are run with the argument -singleCompThread. I've confirmed that if the parent Matlab process is run with -singleCompThread, the command maxNumCompThreads(n), where n > 1 throws an error due to the fact that Matlab is running in single threaded mode. So the result seems to be that (at least in 2014a), you can't increase the number of computational threads on the parallel pool workers. Related to this is that I can't seem to get the Parent matlab process to to start more threads than there are cores, even though the computer itself has hyperthreading enabled. Again, it will happily run maxNumCompThreads(n), where n > # physical cores, but the fact that top is showing CPU utilization to be 50% suggests otherwise.

Accepted Answer

Evan
Evan on 16 Oct 2014
I was wrong about maxNumCompThreads not working on parpool workers. I think the problem was that the code I was using:
parfor j = 1:2
tic
maxNumCompThreads(2);
workersCompThreads(j) = maxNumCompThreads;
i = 1;
while toc < 200
a = randn(10^i)*randn(10^i);
i = i + 1;
end
end
used so much memory by the time I checked CPU utilization that the bottleneck was I/O and the extra threads were already shut down. When I did the following:
parfor j = 1:2
tic
maxNumCompThreads(2);
workersCompThreads(j) = maxNumCompThreads;
i = 4;
while toc < 200
a = randn(10^i)*randn(10^i);
end
end
The extra threads started and stayed running.
As for the second issue, I got a confirmation from the Mathworks that the parent Matlab process won't start more threads than the number of physical cores, even if you explicitly raise the limit beyond that. So in the documentation, the sentence:
"Currently, the maximum number of computational threads is equal to the number of computational cores on your machine."
should say:
"Currently, the maximum number of computational threads is equal to the number of physical cores on your machine."
  2 Comments
Sijie Xiong
Sijie Xiong on 28 Feb 2020
This is extremely helpful because I have been trying to work around nested parallel computing and couldn't find a way or a clear explanation.
So if I want to do the following:
pc = parcluster('local');
parpool(pc, str2num(getenv('SLURM_CPUS_ON_NODE')));
parfor j = 1:10
maxNumCompThreads(4);
workersCompThreads(j) = maxNumCompThreads;
optres = fmincon(parameter_pool(j),...
'UseParallel',true)
end
on the cluster, will the inner optimization with 'UseParallel'=true benefit from the multithreading, that fmincon actually uses 4 threads for optimization? If so, how would I set the parameters in the batch script to support this? For example, would I set the following
#SBATCH --nodes=1
#SBATCH --ntasks=10
#SBATCH --cpus-per-task=4
so that the parfor distributes 10 concurrent jobs as tasks, and each fmincon creates 4 threads based on the 4 cpus-per-task and run the optimization in parallel?
Any feedback is much appreciated!
John Meluso
John Meluso on 18 Mar 2020
Hi Sijie,
You've probably already consulted them, but since it looks like you're using Slurm, I'd recommend speaking to the computing center at your university or organization that manages the cluster to figure out the best way to optimize your problem. Our cluster at UMich advised me to use the following to setup a single node running Matlab across 36 cores (for example) with a single task and allocate all of the available cpus to that one task, like so:
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=36
#SBATCH --mem-per-cpu=5gb
Which gave me 1 node with 36 cores and 5gb per core for a total of 180 gb on 1 node divided among the 36 workers.
The next thing to check is that you're using your organization's setup commands for running matlab on the cluster correctly. We just use commands like the following to set up Matlab to run on a node:
% Use this to limit CPUs to 4 outside of job, available CPUs on the
% node inside one
if isempty(getenv('SLURM_CPUS_ON_NODE'))
nWorkers = 4;
else
nWorkers = str2double(getenv('SLURM_CPUS_ON_NODE'));
end
% Set up the Matlab cluster object
myCluster = parcluster('local') %#ok<NOPTS>
% Create the pool of workers
thePool = parpool('local', nWorkers);
% Verify pool opened successfully
if isempty(thePool)
error('pctexample:backslashbench:poolClosed', ...
['This test requires a parallel pool. ' ...
'Manually start a pool using the parpool command or set ' ...
'your parallel preferences to automatically start a pool.']);
exit %#ok<UNRCH>
end
The Slurm variables may be specific to your cluster. Apologies if you know all of this already, just trying to be thorough since I've been through all of this recently myself!
Finally, because the dimensionality of your parfor loop is low (only 10 runs), you might want to look into using parfeval instead (https://www.mathworks.com/matlabcentral/answers/410364-parfor-or-parfeval-what-is-better). It definitely depends on your problem, but you're likely not using your parallel resources efficiently if you only have 10 function calls to execute in parallel on a cluster.
Hope that helps!

Sign in to comment.

More Answers (1)

Jon Boerner
Jon Boerner on 13 Oct 2014
Based on my understanding of how MATLAB uses cores/threads, your descriptions of what you can/cannot do are all spot on. This thread provides some discussion on this (if you haven't seen it already).
You do make some good points with your use cases for multiple cores per worker and limiting the number of cores MATLAB can access.
If you are using PARFOR, you can limit the number of workers or threads that the loop will use in order to keep some cores free for other processes. The documentation describes the syntax, but it looks something like:
parfor (i=1:100, 6)
%do stuff...
end
The 6 in this case limits the loop to using 6 workers.
More generally though (if you are not using a PARFOR loop), the only options are to use the maxNumCompThreads function, or start MATLAB in a single-threaded mode, like you mentioned.
As for the parameter sweep scenario, I believe you are right that there is no way to work around it (besides re-parameterizing the problem and only making parameter sets that are equal to or larger than the number of cores you have).
  2 Comments
Evan
Evan on 14 Oct 2014
Regarding the 1 thread limit on the workers, do you know if there's some documentation to confirm this?
Jon Boerner
Jon Boerner on 14 Oct 2014
I am not sure how I came across that information actually, and have not been able to find it in the documentation. I'll take one more look with fresh eyes in the morning and let you know if I see anything.

Sign in to comment.

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!