Submitting batch jobs across multiple nodes using slurm

I have a workstation that I am currently using to run the following code structure:
A matlab script that manages everything and iteratively calls a second wrapper function. Within this wrapper, I submit multiple jobs (each one is a model simulation requiring one core) using the batch command, wait for them to all complete, then return some output to the main script. This works fine on my computer running 12 jobs in parallel but each model simulation takes 2-3 hours and I am limited to the number of cores on my machine, ideally I would need to run ~50+ jobs in parallel to get reasonable run times.
I would like to get this working on the university cluster which uses the SLURM workload manager. My problem is that each node on this cluster does not have sufficient cores to get much of a speedup and so I need to submit the job to run on multiple nodes to take full advantage of the resources available. Of course I run into a problem because the main script only needs 1 core and so trying to split this over several nodes makes no sense to slurm and throws an error.
I am very much a beginner with how to use slurm so presumably this is a mistake in how I configure the job submission, the script I am using is as follows:
#!/bin/bash
#SBATCH -J my_script
#SBATCH --output=/scratch/%u/%x-%N-%j.out
#SBATCH --error=/scratch/%u/%x-%N-%j.err
#SBATCH -p 24hour
#SBATCH --cpus-per-task=40
#SBATCH --nodes=2
#SBATCH --tasks=1
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user sebastian.rosier@northumbria.ac.uk
#SBATCH --exclusive
module load MATLAB/R2018a
srun -N 2 -n 1 -c 40 matlab -nosplash -nodesktop -r "my_script; quit;"
The model wrapper that submits multiple batch jobs is something like this:
c = parcluster;
for ii = 1:N
workerTable{ii} = batch(c,'my_model',1,{my_model_opts});
end
with additional lines to check job status and get results etc.
Perhaps what I am trying to do makes no sense and I need to come up with a completely different structure to my MATLAB script. Either way, any help would be much appreciated!
Sebastian

 Accepted Answer

Hi Sebastian,
I'm going to assume that my_script is the code "workerTable{ii} = ..."
There are several ways to approach this, but none require that your Slurm job request >1 node.
OPTION #1
As you've written it, you could request 1 node with 40 cores. Use the local profile to submit single core batch jobs on that one node.
#!/bin/bash
#SBATCH -J my_script
#SBATCH --output=/scratch/%u/%x-%N-%j.out
#SBATCH --error=/scratch/%u/%x-%N-%j.err
#SBATCH -p 24hour
#SBATCH --cpus-per-task=40
#SBATCH --nodes=1
#SBATCH --tasks=1
#SBATCH --mail-type=BEGIN,END,FAIL
SBATCH --mail-user sebastian.rosier@northumbria.ac.uk
#SBATCH --exclusive
module load MATLAB/R2018a
matlab -nodesktop -r "my_script; quit"
OPTION #2
Same Slurm script, but modifyed my_script to make it a bit more streamlined (though parfeval isn't much different than your call to batch).
% Start pool
c = parcluster;
sz = str2num(getenv('SLURM_CPUS_PER_TASK'))-1;
if isempty(sz)
sz = maxNumCompThreads-1;
end
p = c.parpool(sz);
parfor ii = 1:N
results{ii} = my_model(my_model_opts);
end
or
% Start pool
c = parcluster;
sz = str2num(getenv('SLURM_CPUS_PER_TASK'))-1;
if isempty(sz)
sz = maxNumCompThreads-1;
end
p = c.parpool(sz);
for ii = 1:N
f(ii) = p.parfeval(@my_model,1,my_mode_opts);
end
% Run other code
...
% Now fetch the results
for ii = 1:N
[idx,results] = fetchNext(f);
end
OPTION #3
Rather than sticking with a local profile, use a Slurm profile and then expand Option #2 to use a much larger parallel pool (notice in this Slurm script, we're only requesting a single core since parpool will request the larger pool of cores). This will make use of the MATLAB Parallel Server.
#!/bin/bash
#SBATCH -J my_script
#SBATCH --output=/scratch/%u/%x-%N-%j.out
#SBATCH --error=/scratch/%u/%x-%N-%j.err
#SBATCH -p 24hour
#SBATCH --cpus-per-task=1
#SBATCH --nodes=1
#SBATCH --tasks=1
#SBATCH --mail-type=BEGIN,END,FAIL
SBATCH --mail-user sebastian.rosier@northumbria.ac.uk
module load MATLAB/R2018a
matlab -nodesktop -r "my_script; quit"
We'll use parfor here, but we could have used parfeval as well. This assume a 'slurm' profile has been created. Contact Technical Support (support@mathworks.com) if you need help.
c = parcluster('slurm');
p = c.parpool(100);
parfor ii = 1:N
results{ii} = my_model(my_model_opts);
end

3 Comments

Hi Raymond,
Thanks for your reply. I'm slightly confused about options 1 + 2, since each individual node in the cluster is much smaller than 40 cores, wouldn't this just run as many jobs as there are cores -1, then the rest would be pending until those jobs are finished? If so, that is no speed up from what I currently have, since my workstation has as many cores as one individual node on the cluster. Or maybe I have not understood, can MATLAB request more than 1 node if the cores requested in parpool > cores on a node, even if the sbatch script only asks for one node? With option 3, using the slurm profile is something I haven't come across, if this needs to be configured by the cluster manager I doubt it has been done, or would it be possible to do this myself? If using the slurm profile with only 1 node/core requested can then submit as many jobs as requested onto multiple nodes then that would be a solution to the problem, can you confirm if that is the case?
Thanks again!
Hi Sebastian,
A couple of things.
I cleaned up my example a bit so that rather than hardcoding the size of the parpool (e.g. 40), we query Slurm for the appropriate size.
sz = str2num(getenv('SLURM_CPUS_PER_TASK'))-1;
if isempty(sz)
sz = maxNumCompThreads-1;
end
p = c.parpool(sz);
If for some reason we're not running in Slurm, sz will be empty, so we assign it to the number of cores on the machine. I decrement it to account for the MATLAB process that is also running on the machine.
Secondly, I choose 40 because I read
#SBATCH --cpus-per-task=40
#SBATCH --nodes=2
#SBATCH --tasks=1
But as I reread this, I'm guessing there are 20 cores? And that you were requesting 40 across 2 nodes? In any event. As I've now written it (querying SLURM_CPUS_PER_TASK), the parallel pool should size better.
N and the size of the pool don't need to be the same. If N is greater than the size of the pool, then yes, batch jobs will be queued. That's the advantage of using MATLAB Parallel Server. Where the local pool is bound to the size of your machine, running a parallel pool that uses MATLAB Parallel Server allows you to scale to multiple node.
There are advantages and disadvantages to batch vs parpool. With batch, I can submit single core jobs that will probably have less wait time to run (in this case, we're only requesting a single core), but the code must be written slightly differently. parpool requires all the cores to be available before running, but then the code is a bit more elgant.
A hybrid approach to this (submit single core jobs using a parfor syntax) is to use parfor with parforOptions, which might be the best of both worlds.
c = parcluster('slurm');
opts = parforOptions(c);
parfor (ii = 1:N, opts)
results{ii} = my_model(my_model_opts);
end
Here, we're not starting a parallel pool, but using the parfor syntax to submit single core batch jobs.
To answer your last question, yes, you can create a Slurm profile on your own. You can either use these instructions or contacts Technical Support (support@mathworks.com) for help.
Hi Raymon,
Thanks for the detailed answer! I'll have a go implementing this on the cluster and contact support if I run into further problems.
Sebastian

Sign in to comment.

More Answers (0)

Categories

Products

Release

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!