No parallel computing when using parfor

3 views (last 30 days)
Joe
Joe on 10 Nov 2013
Commented: Joe on 12 Nov 2013
Hello,
I have some code that uses parfor to use parallel computing. The code does not give any error and runs well, provides the right output, etc. However it is important to do parallel computing as the calculation takes quite a bit of time (currently 2 minutes each iteration), and needs to be run thousands of times.
In terms of context:
  • temperature_v9 is the function that does the true analysis, takes as inputs temperature measurements and targets and develops some predictions. The inputs have tens of millions of rows
  • temperature_wrapper_v9 gets the same inputs but split in tranches (the split is done with another function) and then runs temperature_v9 for each tranch. the idea is to do it in parallel to speed up time. For example to split the data in 10 tranches, run 10 instances of temperature_v9 in parallel, and then concatenate the results of the 10 instances at the end
  • Both approaches give the same results
  • The second approach does not paralellize de facto, and the processing time is slightly higher than with the first approach
  • In both approaches only one core is at 100% and 11 cores at at very low load
  • In both approaches there is plenty of RAM memory available
  • I have used the profiler and the time is spent in many different tasks, there is nothing above 5%-10%. The 2 biggest activities are 2 calls at the std function, which I can not avoid. So I want to focus on solving the problem by parallel computing if possible.
Can anyone shed some light on how to paralelize this calculation? (code pasted below), I must be missing something here.
Thanks in advance,
Joe
function [statisticTemp, totalTemp, tempFunction, AccTempFunction, PF, increasePercent, avgInstantTemp, instantsNumber, decreaseFunction, increaseFunction, PercentTempFunction] = temperature_wrapper_v9(lowTempSizeMatrixTranches, lowTempMatrixTranches, highTempSizeMatrixTranches, highTempMatrixTranches, trueMidPointsTranches, trueLowsTranches, trueHighsTranches, trueSpreadsTranches, window, refreshRate, expectedIncrease, depthOfMeasure, numDevMaxEntry, numDevMinEntry, numDevExit, changeMin, changeMax, numDevMinSpread, maxSpread, alpha, SLT, print, graph)
sizeData=size(lowTempMatrixTranches);
tranches=sizeData(1,1);
statisticTempTranches= zeros(tranches, 1);
totalTempTranches= zeros(tranches, 1);
tempFunctionTranches= zeros(tranches, sizeData(1,2));
AccTempFunctionTranches= zeros(tranches, sizeData(1,2));
PFTranches= zeros(tranches, 1);
increasePercentTranches= zeros(tranches, 1);
avgInstantTempTranches= zeros(tranches, 1);
instantsNumberTranches= zeros(tranches, 1);
decreaseFunctionTranches= zeros(tranches, sizeData(1,2));
increaseFunctionTranches= zeros(tranches, sizeData(1,2));
PercentTempFunctionTranches= zeros(tranches, sizeData(1,2));
statisticTempAux= 0;
totalTempAux= 0;
tempFunctionAux= zeros(1, sizeData(1,2));
AccTempFunctionAux= zeros(1, sizeData(1,2));
PFAux= 0;
increasePercentAux= 0;
avgInstantTempAux= 0;
instantsNumberAux= 0;
decreaseFunctionAux= zeros(1, sizeData(1,2));
increaseFunctionAux= zeros(1, sizeData(1,2));
PercentTempFunctionAux= zeros(1, sizeData(1,2));
parfor i=1:tranches
[statisticTempAux, totalTempAux, tempFunctionAux, AccTempFunctionAux, PFAux, increasePercentAux, avgInstantTempAux, instantsNumberAux, decreaseFunctionAux, increaseFunctionAux, PercentTempFunctionAux] = temperature_v9(squeeze(lowTempSizeMatrixTranches(i,:,:)), squeeze(lowTempMatrixTranches(i,:,:)), squeeze(highTempSizeMatrixTranches(i,:,:)), squeeze(highTempMatrixTranches(i,:,:)), squeeze(trueMidPointsTranches(:,i)), squeeze(trueLowsTranches(:,i)), squeeze(trueHighsTranches(:,i)), squeeze(trueSpreadsTranches(:,i)), window, refreshRate, expectedIncrease, depthOfMeasure, numDevMaxEntry, numDevMinEntry, numDevExit, changeMin, changeMax, numDevMinSpread, maxSpread, alpha, SLT, 0, 0);
statisticTempTranches (i,:)=statisticTempAux;
totalTempTranches (i,:)=totalTempAux;
tempFunctionTranches (i,:)=tempFunctionAux;
AccTempFunctionTranches (i,:)=AccTempFunctionAux;
PFTranches (i,:)=PFAux;
increasePercentTranches (i,:)=increasePercentAux;
avgInstantTempTranches (i,:)=avgInstantTempAux;
instantsNumberTranches (i,:)=instantsNumberAux;
decreaseFunctionTranches (i,:)=decreaseFunctionAux;
increaseFunctionTranches (i,:)=increaseFunctionAux;
PercentTempFunctionTranches (i,:)=PercentTempFunctionAux;
end
%Postprocessing: I reassemble all the outputs of the different tranches in a single one
  2 Comments
Walter Roberson
Walter Roberson on 10 Nov 2013
Could you re-arrange the order of the dimensions for lowTempSizeMatrixTranches ? Perhaps at an outer level? And also for your other variables?
permute(lowTempSizeMatrixTranches, [2 3 1])
that would set things up so you index by the third dimension, making each slice into contiguous memory and removing the need for the squeeze().
When practical, index by the last dimension instead of the first.
Joe
Joe on 12 Nov 2013
Hi Walter, I was not aware of this. Yes, rearranging should not be a problem, let me try it and see what happens.
Thanks a lot!

Sign in to comment.

Answers (1)

Joe
Joe on 12 Nov 2013
Hi - it was easy: I needed to set up the matlabpools. I just did it and the parfor works. However, I am surprised that I still don't manage to leverage the full power of the processor: when launching the calculation without parfors, the CPU gets a 14% load aprox. When sending with the parfor, it gets to 60%. I have 6 real cores and 12 with hyperthreading, and I have tried to launch the calculation with 6 workers and 8 workers, in both cases I get to 60% workload.
Any suggestion on how to fully load the processor?
Thanks

Categories

Find more on Parallel for-Loops (parfor) in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!