Best way to speed up (parallelize) a large function that take three large 3D arrays as input

1 view (last 30 days)
I have a function that I generated that is the doubled integral evaluated via trapezoidal rule using 1000 segments of another function in x and y:
. I have this very large function saved to a file, and the input I'm now feeding it, , are 3D arrays. As of right now, with arrays that are it takes roughly a minute for the computation to finish, and in future I'd like to potentially pass it far larger arrays (e.g. ). I've been trying to find away that I could speed this up using parallel processing. The issue is that all ways I've tried to implement any parallelization has slowed it down. What I start with is
s = 10;
xi = -1:2/(s-1):1;
[xi1,xi2,xi3] = ndgrid(xi,xi,xi);
I've tried the following things:
spmd with distributed()
XI1 = distributed(xi1);
XI2 = distributed(xi2);
XI3 = distributed(xi3);
spmd
Z = myfunc(XI1,XI2,XI3);
end
However this made the processing take roughly 30 minutes.
spmd with codistributed()
spmd
XI1 = codistributed(xi1);
XI2 = codistributed(xi2);
XI3 = codistributed(xi3);
Z = myfunc(XI1,XI2,XI3);
end
Z = gather(Z);
This made the computation take roughly 40 minutes.
parfor loop with mat2tiles()
XI1 = mat2tiles(xi1,[2,2,2]);
XI2 = mat2tiles(xi2,[2,2,2]);
XI3 = mat2tiles(xi3,[2,2,2]);
Z = mat2tiles(zeros(s,s,s),[2,2,2]);
N = numel(XI1);
parfor i=1:N
Z{i} = myfunc(XI1{I},XI2{I},XI3{i});
end
This took about 6 minutes to run. mat2tiles() is found here under file share: MAT2TILES: divide array into equal-sized sub-arrays - File Exchange - MATLAB Central (mathworks.com)
parfeval
work = parfeval(@myfunc,xi1,xi2,xi3);
Z = fetchOutputs(work);
This was evantually stopped because I waited over an hour for the the fetchOutputs to finish before cancelling.
I'm not too skilled with parallelization, but I imagine that there must be a faster way for me to have my different workers work on different parts of my array inputs. This is what I thought distributed() and codistributed() did, but the amount of time extra it took for them to finish was far too long to think it's that simple.
  3 Comments
Matt J
Matt J on 4 Nov 2022
Edited: Matt J on 4 Nov 2022
How long does it take to process a single triplet xi1,xi2,xi3 without parallelization? How many workers, N, are you using?

Sign in to comment.

Answers (1)

Matt J
Matt J on 4 Nov 2022
Edited: Matt J on 4 Nov 2022
I think you need to extract the local part, like in the following:
XI1 = distributed(xi1);
XI2 = distributed(xi2);
XI3 = distributed(xi3);
glp=@getLocalPart;
spmd
Z = myfunc(glp(XI1),glp(XI2),glp(XI3));
end

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!