distributed arrays slow with batch jobs

4 views (last 30 days)
Maria
Maria on 28 Oct 2021
Commented: Maria on 1 Nov 2021
Hi,
I am working with distributed arrays.
As far as I understood, I can create distributed arrays directly on a cluster. When I want to manipulate what is inside the distributed array, I need to use spmd.
I wanted to avoid any interactive pool. For this reason, I created a function that uses a distributed array, and send it to the cluster as a batch job. The function looks like
function R = my_distributed_function(input)
R = eye(N,'distributed' );
for k = 1 : N
for m = 1 :N
R(k,m) = 1 *m;
end
end
And I send this to the cluster as a batch job
job_distributed = batch(c,@my_distributed_function,1,{myinput},'Pool',N-1,'CurrentFolder','.','AutoAddClientPath',false);
However, it takes very long, around 64 seconds. The function without the "distributed" takes around 2 ms.
If I do not use the batch job, but keep the "distributed" option, the interactive pool starts. Then of course, it takes around 2 seconds, but there is the time to start the parallel pool.
My question is : why the batch job takes so long if I use a function that uses distributed arrays?

Accepted Answer

Thomas Falch
Thomas Falch on 29 Oct 2021
A batch job with the 'Pool' option ( a "batch-pool job") will end up starting the equivalent of a interactive pool, but using one of the workers as a substitute for the MATLAB desktop client. The overall time for such a job will therefore be pool startup + the acutall work you're doing. In other words, it will take about the same amount of time as an interactive pool.
The main benefit of an batch-pool job is that you can submit the job to the cluster, and then shut down the MATLAB desktop client (and indeed the computer it's running on). Meanwhile, the job is running on the cluster, and you can come back much later to get the results. This is useful for long running jobs which don't require any user input (which is what interactive pools are for).
  3 Comments
Maria
Maria on 1 Nov 2021
Thank you for the clarification. I had completely misunderstood this point. I have some createTasks with some parfor, and I thought that the parfor was going to be executed as parfor...But now that I think well, of course, because createTasks creates the task at worker level, and it is 1 worker per core.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!