75 views (last 30 days)

Show older comments

I have a complex matrix A, and would like to modify it Nt times according to A = exp( -1i*(A + abs(A).^2) ). The size of A is typically 1000x1000, and the number of times to run would be around 10000.

I am looking to reduce the time taken to carry out these operations. For 1000 iterations on the CPU, I measure around 6.4 seconds. Following the Matlab documentation, I was able to move this to the GPU, which reduced the time taken to 0.07 seconds (an incredible x91 improvement!). So far so good.

However, I also now read this link in the docs, which describes how we can sometimes find even further improvement for element-wise calculations if we use arrayfun() as well. If I try to follow the tutorial, the time taken is actually worse, clocking in at 0.47 seconds. My tests are shown below:

Nt = 1000; % Number of times to run each method

test_functionFcn = @test_function;

A = rand( 500, 600, 'double' ) + rand( 500, 600, 'double' )*1i; % Define an initial complex matrix

gpu_A = gpuArray(A); % Transfer matrix to a GPU array

%%%%%%%%%%%%%%%%%%%% Run the calculation Nt times on CPU only %%%%%%%%%%%%%%%%%%%%

cpu_data_out = A;

tic

for k = 1:Nt

cpu_data_out = test_function( cpu_data_out );

end

tcpu = toc;

%%%%%%%%%%%%%%%%% Run the calculation Nt times on GPU directly %%%%%%%%%%%%%%%%%%%%

gpu_data_out = gpu_A;

tic

for k = 1:Nt

gpu_data_out = test_function(gpu_data_out);

end

tgpu = toc;

%%%%%%%%%%%%%% Run the calculation Nt times on GPU using arrayfun() %%%%%%%%%%%%%%

gpuarrayfun_data_out = gpu_A;

tic

for k = 1:Nt

gpuarrayfun_data_out = arrayfun( test_functionFcn, gpuarrayfun_data_out );

end

tgpu_arrayfun = toc;

%%% Print results %%%

fprintf( 'Time taken using only CPU: %g\n', tcpu );

fprintf( 'Time taken using gpuArray directly: %g\n', tgpu );

fprintf( 'Time taken using GPU + arrayfun(): %g\n', tgpu_arrayfun );

%%% Function to operate on matrices %%%

function y = test_function(x)

y = exp(-1i*(x + abs(x).^2));

end

and the results are:

Time taken using only CPU: 6.38785

Time taken using gpuArray directly: 0.0680587

Time taken using GPU + arrayfun(): 0.474612

My questions are:

1) Am I using arrayfun() correctly in this situation, and it is expected that arrayfun() should be worse?

2) If so, and it is really just expected that it is slower than the direct gpuArray method, is there any easy (i.e non-MEX) way to speed up such a calculation? (I see they also mention using pagefun() for example).

Thanks in advance for any advice.

(The graphics card is Nvidia Quadro M4000, and I am running Matlab R2018a)

Edric Ellis
on 19 Jan 2021

A few points here. Firstly, (and most importantly), to time code on the GPU, you need to use either gputimeit, or you need to inject a call to wait(gpuDevice) before calling `toc`. That's because work is launched asynchronously on the GPU, and you only get accurate timings by waiting for it to finish. With those minor modifications, on my GPU, I see 0.09 seconds for the `gpuArray` method, and 0.18 seconds for the `arrayfun` version.

Running a loop of GPU operations is generally inefficient, so the main gain you can get here is by pushing the loop inside the `arrayfun` function body so that that loop runs directly on the GPU. Like this:

%%% Function to operate on matrices %%%

function x = test_function(x,Nt)

for ii = 1:Nt

x = exp(-1i*(x + abs(x).^2));

end

end

You'll need to invoke it like `A = arrayfun(@test_function, A, Nt)`. On my GPU, this brings the `arrayfun` time down to 0.05 seconds, so about twice as fast as the plain `gpuArray` version.

Edric Ellis
on 12 Feb 2021

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!