GPU time slower than CPU time, what went wrong with my GPU implementation?
15 views (last 30 days)
Hi all, I have been testing the GPU computing feature in MATLAB. The code below is running and timing large matrix multiplications (1024x1024) using CPU and GPU computing:
After many trials, the results using CPU turns out to be faster than GPU time. I am surprised because this guy on stackoverflow forum did the exact testing and he proved that using GPU is faster:
>> A = rand(1024); gA = gpuArray(A);
% warm up by executing the operations a couple of times, and then:
>> tic, C = A * A; toc
Elapsed time is 0.075396 seconds.
>> tic, gC = gA * gA; toc
Elapsed time is 0.008621 seconds.
The only reason I can think of is that we are using different GPUs. The other guy has a Tesla C2070 while the laptop I am using is Dell Inspirion17R (NVIDIA GeForce GT 525M).
Could it be possible that by using a lesser GPU, the computation is actually slower than using CPU ?
Thank you! Ruby
Ben Tordoff on 20 Jan 2012
I've just uploaded a benchmarking tool to the File Exchange which runs a whole load of these type of timings to put your GPU in context with others in the market:
One thing to bear in mind is that virtually all GPUs that aren't explicitly designed for scientific computing are optimized for single-precision maths (as is used by OpenGL etc.). GeForce cards, mobile or otherwise, are quite good for single-precision performance but usually about 8x worse for double. MATLAB defaults to using double-precision everywhere. Of the NVIDIA cards, only the Tesla and top-end Quadro series do well at double-precision. Add to that the fact that a mobile GPU typically has far fewer cores than a desktop one, and I'd be amazed if you saw any significant speed-ups compared to a modern mobile CPU when doing double-precision maths.
Anyway, give the benchmark a try and let us all know what you find.
More Answers (1)
Walter Roberson on 19 Jan 2012
Your GeForce GT 525M would be handling the graphics rendering, whereas the Tesla probably would not be handling graphics (and can be specifically configured to take it off graphics duties, I seem to recall.)
The GT 525M has 96 cores at up to 1.2 GHz; the Tesla C2070 has 448 cores at 1.15 GHz -- 4 times the cores.