Internal function time increases with number of workers

2 views (last 30 days)
When increasing parallelization there is typically the trade-off between distributing the computation and increasing communication overhead. Theoretically, the internal function time should be constant, as the I/O handling occurs before the function call and the combining of data from across cores occurs after the function call.
However, I seem to be experiencing an increase in the internal function time when parallelizing on my machine. It appears that degree of parallelization actually makes the function calls slower.
I made some example code to test this:
function test_parallel_timing()
g=gcp;
pools=1:g.NumWorkers;
mean_times=zeros(1,length(pools));
for pp=1:length(pools)
num_pools=pools(pp);
disp(' ');
disp(['RUNNING ON ' num2str(num_pools) ' POOLS']);
times=zeros(1,num_pools);
parfor (ii=1:max(pools),num_pools)
times(ii)=pool_function;
end
mean_times(pp)=mean(times);
disp(['Mean function time: ' num2str(mean(times))]);
end
figure
plot(pools,mean_times);
xlabel('Number of Pools');
ylabel('Mean Computation Time (sec)');
end
function function_time = pool_function
start_time=tic;
%Do some costly function
tmp=toeplitz(1:2000)*toeplitz(1001:3000);
function_time=toc(start_time);
disp([' Function took ' num2str(function_time) ' seconds']);
end
Which results in the following plot:
The timing is designed completely internally to the function, which should give the time without any of the overhead. If my timing is indeed being done correctly, it appears that the functions are getting slower as a function of workers. What could cause this?

Answers (1)

OCDER
OCDER on 9 Jul 2018
Edited: OCDER on 9 Jul 2018
Interesting. It does seem the function time is increasing with number of workers, BUT, the total time to run the parfor loop does decrease. Not sure what's happening behind the scene of the Matlab job scheduler https://www.mathworks.com/help/distcomp/how-parallel-computing-products-run-a-job.html.
Perhaps the more appropriate way to measure the "observed function time" is by taking the total parfor loop time / number of iterations. See the following code:
function test_parallel_timing()
N = 400; %parpool iterations
g = gcp;
pools = 1:g.NumWorkers;
mean_times = zeros(1,length(pools));
total_times = zeros(1,length(pools));
for num_pools = 1:4
fprintf('RUNNING ON %d POOLS\n', num_pools);
times = zeros(1,N);
a = tic;
parfor (ii = 1:N, num_pools)
times(ii) = pool_function;
end
total_times(num_pools) = toc(a);
mean_times(num_pools) = mean(times);
fprintf('Mean function time: %f\n\n', mean(times));
end
figure
plot(1:length(pools), mean_times, 'r', 1:length(pools), total_times/N, 'g');
xlabel('Number of Pools');
ylabel('Mean(red) or Total/N(green) Computation Time (sec)');
end
function function_time = pool_function
start_time = tic;
tmp = toeplitz(1:500)*toeplitz(1:500); %Do costly function
function_time = toc(start_time);
end
  4 Comments
Michael
Michael on 9 Jul 2018
The total execution time is certainty the most important, but this shows that even if you make the IO to the workers extremely efficient (slicing data, using parconstant where necessary, etc.), then you still get degradation in performance by increasing cores. This adds an unknown factor to optimizing parallelization, which makes things difficult.
Hopefully Mathworks chimes in on this thread!
Mahboob Karimian
Mahboob Karimian on 28 Nov 2019
Edited: Mahboob Karimian on 28 Nov 2019
I had the same problem with my optimization task. In an HP server with a powerful Xeon Gold 6240 CPU, when I run my code without parallelization, every iteration takes 9 seconds with a 57% CPU load. When I use parallelization with 12 workers it loads the CPU only 9% in total and every iteration takes so long!
After some effort, I changed the number of threads from 1 to 8 in the local profile configurations, then the mentioned time reduced to 2.7 seconds. But the problem is that still, 2.7 seconds is so much for a powerful CPU like this. In my pc with an Intel Core i7 4770, without parallelization, the execution time for each iteration is just about 8 seconds.
I really couldn't find the problem yet, maybe it is related to the overhead or scheduler. But, at the end, this shows configuration is very important, and Mathworks documentation is not enough for a user to setup his/her machine to work in full speed.

Sign in to comment.

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Products


Release

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!