MATLAB Answers

0

Parallel computing with shared variables, problem with struct

Asked by Patrizio Graziosi on 17 Jul 2019
Latest activity Commented on by Patrizio Graziosi on 1 Aug 2019
Hi all,
I need to parallelize a code that has four nested for-loops inside which a script runs (tau_calc), that calls other scripts (like tau_ADP_v2) according to input information. These scripts need to have access to the whole workspace that has around 30 variables plus a large struct ‘state_ID’ (2 to 3 Gb).
I should parallelize on the id_E index, or [id_E,id_n] , but I cannot figure out how to pass everything to the parfor, especially the large struct and how to save temporary variables to write the state_ID struct. I understand that inside a parfor it cannot be written in the separate workers. The two scripts I attach are working correctly in serial version.
I’m in an impasse and cannot get out of it. I really need of support…
Thanks
Patrizio

  0 Comments

Sign in to comment.

1 Answer

Answer by Edric Ellis
on 18 Jul 2019
 Accepted Answer

I must admit I didn't look at your code in great detail - but I did get the distinct impression that there's a lot going on there. The script tau_calc_short has a very high degree of "cyclomatic complexity" - in other words, it has lots of deeply nested control structures. The script tau_ADP_v2 has quite a few copies of near-identical computations which again are highly complex.
Now, none of that means that you can't run that stuff as one giant parfor loop, but it isn't going to make life easy. In particular, parfor needs to be able to prove that your loop iterations are independent. The parfor machinery doesn't care about the complexity of your code - but if it refuses to run your loop, it will probably be difficult for you to follow its reasoning.
Therefore, my main advice to you is: try to restructure your code into more self-contained functions. Done correctly, this will let you compartmentalise the complexity, so that the high-level computation is more digestible to the human reader. Once this is done, it will be much more feasible to work out how to apply parfor, since it will be more obvious where the independent (and thus parallelisable) portions are. Sorry that there aren't any simple answers for this sort of case.

  3 Comments

Hi Edric,
thank you for your indications.
I ended up to save the workspace (around 1 GB) and attach it to the parpool.
Then the "tau_calc" becomes a big function that loads the workspace, the scripts like tau_ADP becomes subfunctions. I see this is quite a rough solution to polish but works on my pc (4 workers).
save('WorkSpace','-v7.3','-nocompression');
poolobj = gcp;
addAttachedFiles(poolobj,{'WorkSpace.mat'})
WorkersConstant = parallel.pool.Constant('WorkSpace.mat');
parfor id_E = 1:nE
for id_n = 1:n_bands_transp
[tau_temp, tau_matth_temp, tau_IIS_temp] = tau_calc_funct_v3(id_E, id_n, 'WorkSpace.mat'); % the big tau_calc routine, the actual tau_calc in the serial version
taus(id_E,id_n) = tau_temp;
taus_matth(id_E,id_n) = tau_matth_temp;
if strcmp(IIS,'yes')
taus_IIS(id_E,id_n) = tau_IIS_temp;
end
end
end
The issue is now that when I run it on a cluster I get a number of aborted workers
[^HWarning: A worker aborted during execution of the parfor loop. The parfor loop will now run again on the remaining workers.]^H [^H> In distcomp.remoteparfor/handleIntervalErrorResult (line 234) In distcomp.remoteparfor/getCompleteIntervals (line 364) In parallel_function>distributed_execution (line 745) In parallel_function (line 577) In tau_calc_parallel_VOMBATO_v3 (line 326)]^H
1) Does the parfor starts again from the beginning or continue with the other workers?
2) Can you help me in this? Shall I open a new question?
Thanks
Patrizio
Whether parfor starts from the complete beginning again depends on the release of MATLAB. (I can't remember when we changed that to only re-run the failing portions - but it might well be pretty recent, i.e. R2019a or R2018b). If your workers are crashing like that, hopefully there are some crash dumps around which will help you diagnose things further.

Sign in to comment.