GPU computation on mesh using MatLab

8 views (last 30 days)
Hello,
I am new to GPU computation in Matlab and would like some advice.
I would like to calculate certain physical properties (stress) on a 2D mesh. The value at each grid point is the sum of a number of stresses (which need to be in turn calculated).
Therefore, one could have either 1 loop over the mesh points and a vectorized version of the calculation of the constituent stresses (summing them altogether at the end), or two nested for loops.
The number of constituent stresses does not necessarily match the number of mesh points. Hence, I do not believe I can use the arrayfun tool in Matlab. Furthermore, I would also have to pass through certain physical constants.
I have tried evaluating the first possible structure (i.e. 1 loop over the mesh points and a vectorial calculation of the inputs), by transferring the grids etc. onto the gpu using gpuArray. I find however that the performance is terrible (orders of magnitude slower than CPU computing, even though I have a fairly powerful graphics card, Nvidia GTX670).
A question is, if one has a for loop using arrays on the gpu, does Matlab automatically assign them to different threads?
I have a C version of the calculation of a single input on a single mesh point, so I wonder whether I should simply write a kernel function in CUDA?
Kind Regards,
F

Accepted Answer

Matt J
Matt J on 28 Jul 2013
Edited: Matt J on 28 Jul 2013
A question is, if one has a for loop using arrays on the gpu, does Matlab automatically assign them to different threads?
No. In fact, I don't think it ever does.
I have a C version of the calculation of a single input on a single mesh point, so I wonder whether I should simply write a kernel function in CUDA?
Maybe, but knowing more about what that calculation looks like might help us see how to do it with built-in gpuArray commands.
  6 Comments
Matt J
Matt J on 28 Jul 2013
Edited: Matt J on 29 Jul 2013
however, I am not sure why you think I do not need the for-loop. The variable "Stress" is already a vector. What I am doing is, for each grid point, adding various stress contributions (which are in the form of a vector) into "Stress_total".
That doesn't automatically mean you need a for-loop. If I have an MxNxP array StressVectors, where N is the number of grid points and P is the number of vectors at each grid point, I can just do
Stress_total=sum(StressVectors,3);
If I have different numbers of stress vectors at each grid point, I just set some of them to zero. MATLAB will parallelize the call to sum() using its natural built-in multi-threading or, if you make StressVectors a gpuArray variable, it will leverage the GPU to do it.
Thirdly, I guess you are right I can avoid doing FFTs, but I think it should be mean() not sum().
Not based on what you had written. The DC component of MATLAB's fft(x) is the sum over the x(i), e.g.,
>> y=fft(1:5); y(1)-sum(1:5)
ans =
0
Francesco
Francesco on 29 Jul 2013
OK.
I'll try vectorizing it that way and also see if I can make the CUDA C kernel file communicate with Matlab and see which is fastest.
Thanks a lot!
F

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!