MATLAB Answers

Only few workers out of the pool are utilized in the end of simulation

2 views (last 30 days)
Damir Rakhimov
Damir Rakhimov on 17 Nov 2020
Edited: Damir Rakhimov on 17 Nov 2020
Hello,
I am running simulations in parallel using parfor in Matlab and recently I noticed that close to the end of the simulation it is slowing down.
Trying to understand what can be the reason, I have collected information about the simulation progress over time and have noticed that near the end of the simulation only few workers are running and processing tasks, while it is not clear where are the others. Attached to the message you will find the log file with information about simulation progress from my machine (Windows 10, Matlab 2018b, 4 cores - 8 threads). You can see there that the last 13 tasks were processed by only 2 (out of 8) workers. The following code were used to get the worker ID:
workerID = get(getCurrentTask(),'ID');
I would appreciate, if you could help me understand what is the reason for such behaviour and how to improve it.
Edit #1:
Originally, I was trying to run 8 workers, while the pc has 4 cores (intel i7 processor) and should support 8 threads.
Another log for run with 4 workers was attached to the message later.

  0 Comments

Sign in to comment.

Answers (1)

Mario Malic
Mario Malic on 17 Nov 2020
Edited: Mario Malic on 17 Nov 2020
Hello Damir,
Initialise your parpool only with logical threads, or maxNumCompThreads number.
In your log, some of your simulations take too long to process, which might be due to the fact that your pool has more workers than logical threads.

  6 Comments

Show 3 older comments
Mario Malic
Mario Malic on 17 Nov 2020
17-Nov-2020 11:03:05 - WrkID:1 tStart: 710s tStop:1532s tIter:823s DoneID:23/66 Done#:64/66 (97%) Scen: alg#3
17-Nov-2020 11:03:05 - WrkID:1 tStart:1533s tStop:1533s tIter: 0s DoneID:59/66
It is a little bit weird that same algorithm and same worker would produce such different result.
Could you post a log with 4 workers?
Your simulation doneID: 23 has some issues, probably some error within caculations, as it doesn't report SNR, so you should check it out.
Raymond Norris
Raymond Norris on 17 Nov 2020
If I understand this correctly, you have 66 sims and are running a parallel pool of 4 workers. Best case scenario, all 2 workers would run 16 sims and 2 workers would run 17 sims, correct?
Internally, parfor allocates a subset of the sims to each of the workers, but not all the sims at once. For instance, initially, each worker might be given 8 sims. When a worker can take more work, it then might be given 3 sims (I'm intentially given an arbitrary scheme). In the end, there may be some workers that are left doing work, but ideally, all are busy at the same time. I wouldn't expect the last two workers to process the remaining 13, but perhaps each sim can take a different amount of time? Maybe workers 3 and 4 are still working on their previous batches (I haven't digested your log file to look for a pattern).
Rather than using the parfor load balancing model, you might try parfeval, which gives each worker one tasks at a time.
Damir Rakhimov
Damir Rakhimov on 17 Nov 2020
I have made another run, but this time with 4 workers. The log file is attached to the original message. There one can see the similar behaviour - last tasks were completed by only one worker.
Raymond, thank you for the information!
It looks like that there some scheduling takes place and due to the fact that the computation time for different algorithms is not the same, the worker with the heaviest sequence of tasks will continue to work on them untill the end after other workers finished their jobs.
I will try to use parfeval and will report how it works.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!