Clear Filters
Clear Filters

Best way to speed up (parallelize) a large function that take three large 3D arrays as input

1 view (last 30 days)
I have a function that I generated that is the doubled integral evaluated via trapezoidal rule using 1000 segments of another function in x and y:
. I have this very large function saved to a file, and the input I'm now feeding it, , are 3D arrays. As of right now, with arrays that are it takes roughly a minute for the computation to finish, and in future I'd like to potentially pass it far larger arrays (e.g. ). I've been trying to find away that I could speed this up using parallel processing. The issue is that all ways I've tried to implement any parallelization has slowed it down. What I start with is
s = 10;
xi = -1:2/(s-1):1;
[xi1,xi2,xi3] = ndgrid(xi,xi,xi);
I've tried the following things:
spmd with distributed()
XI1 = distributed(xi1);
XI2 = distributed(xi2);
XI3 = distributed(xi3);
spmd
Z = myfunc(XI1,XI2,XI3);
end
However this made the processing take roughly 30 minutes.
spmd with codistributed()
spmd
XI1 = codistributed(xi1);
XI2 = codistributed(xi2);
XI3 = codistributed(xi3);
Z = myfunc(XI1,XI2,XI3);
end
Z = gather(Z);
This made the computation take roughly 40 minutes.
parfor loop with mat2tiles()
XI1 = mat2tiles(xi1,[2,2,2]);
XI2 = mat2tiles(xi2,[2,2,2]);
XI3 = mat2tiles(xi3,[2,2,2]);
Z = mat2tiles(zeros(s,s,s),[2,2,2]);
N = numel(XI1);
parfor i=1:N
Z{i} = myfunc(XI1{I},XI2{I},XI3{i});
end
This took about 6 minutes to run. mat2tiles() is found here under file share: MAT2TILES: divide array into equal-sized sub-arrays - File Exchange - MATLAB Central (mathworks.com)
parfeval
work = parfeval(@myfunc,xi1,xi2,xi3);
Z = fetchOutputs(work);
This was evantually stopped because I waited over an hour for the the fetchOutputs to finish before cancelling.
I'm not too skilled with parallelization, but I imagine that there must be a faster way for me to have my different workers work on different parts of my array inputs. This is what I thought distributed() and codistributed() did, but the amount of time extra it took for them to finish was far too long to think it's that simple.
  3 Comments
Matt J
Matt J on 4 Nov 2022
Edited: Matt J on 4 Nov 2022
How long does it take to process a single triplet xi1,xi2,xi3 without parallelization? How many workers, N, are you using?

Sign in to comment.

Answers (1)

Matt J
Matt J on 4 Nov 2022
Edited: Matt J on 4 Nov 2022
I think you need to extract the local part, like in the following:
XI1 = distributed(xi1);
XI2 = distributed(xi2);
XI3 = distributed(xi3);
glp=@getLocalPart;
spmd
Z = myfunc(glp(XI1),glp(XI2),glp(XI3));
end

Categories

Find more on Parallel Computing in Help Center and File Exchange

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!