Summing array elements seems to be slow on GPU
9 views (last 30 days)
Show older comments
Damian Suski
on 26 Apr 2023
Commented: Damian Suski
on 18 May 2023
I am testing the times of execution for the following function on CPU and GPU
function funTestGPU(P,U,K,UN)
for k = 1:P
H = exp(1i*K);
HU = U.*H;
UN(k,:) = sum(HU,[1,3]);
end
end
where , are complex arrays of size and Kis a complex array of size . So in each iteration I perform element-wise exp(), element-wise multiplication of two arrays and summing elements of 3D array along two dimensions.
I test the execution time on CPU and on GPU with the help of the following script
P = 200;
URe = 1/(sqrt(2))*rand(P);
UIm = 1/(sqrt(2))*rand(P);
KRe = 1/(sqrt(2))*rand(P,P,P);
KIm = 1/(sqrt(2))*rand(P,P,P);
% CPU
U = complex(URe, UIm);
K = complex(KRe, KIm);
UN = complex(zeros(P), zeros(P));
fcpu = @() funTestGPU(P,U,K,UN);
tcpu = timeit(fcpu);
disp(['CPU time: ',num2str(tcpu)])
% GPU
U = gpuArray(complex(URe, UIm));
K = gpuArray(complex(KRe, KIm));
UN = gpuArray(complex(zeros(P), zeros(P)));
fgpu = @() funTestGPU(P,U,K,UN);
tgpu = gputimeit(fgpu);
disp(['GPU time: ',num2str(tgpu)])
and I obtain the results
CPU time: 9.0315
GPU time: 3.3894
My concern is that if I remove the last operation from the funTestGPU (summing array elements) I obtain the results
CPU time: 8.0185
GPU time: 0.0045631
So it looks like the summation is the most time-consuming operation on GPU. Is that an expected result?
I wrote the analogical codes in cuPy and in Pytorch and there the summation does not seem to be the most time consuming operation.
I use Matlab 2019b. My graphics card is NVIDIA GeForce GTX 1050 Ti (768 CUDA cores), my processor is AMD Ryzen 7 3700X (8 physical cores).
2 Comments
Accepted Answer
Joss Knight
on 27 Apr 2023
These are my results that I got on my (somewhat old) GeForce GTX 1080 Ti:
CPU time: 16.1288
GPU time: 0.96266
If I change the datatype to single I get:
CPU time: 14.9785
GPU time: 0.35102
That's maybe 2x faster?
So on the one hand your GPU is pretty slow and your CPU is pretty fast, and on the other maybe you could try using single precision instead, if you don't mind the loss of accuracy.
More Answers (1)
Joss Knight
on 27 Apr 2023
Moved: Matt J
on 27 Apr 2023
Why are you recomputing H and HU inside the loop? They do not change. If you remove the sum, because the results are never used from the first (P-1) iterations, only the last computation of those values will actually take place.
6 Comments
See Also
Categories
Find more on GPU Computing in MATLAB in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!