How overloaded functions are implemented on gpu? I.e. how can I set number of threads and thread blocks when I call. GpuArray?

I learned about using *.cu files and compile them to get *.ptx files, but I'm concernead about built-in gpu supported functions. If I used gpuArray to transfer a variable to Gpu, will any further operations. (s.a multiplication) performed on that variable be done on Gpu? In that case how can I know/set number of thread blocks and threads in each kernel?

 Accepted Answer

All operations on gpuArray data take place on the GPU. For built-in things like matrix multiplication, the allocation of blocks and threads is done automatically, and you have no control over it. In contrast, when using CUDAKernel to operate on gpuArray data, you must explicitly choose the number of threads and blocks to use.

2 Comments

Thanks a lot Edric, I just wanted to make sure that I have no control on blocks and threads when using gpuarray variables, but do you think is it better to use the cuda enabled matlab functions or to rewrite my own kernel to get better performance and optimization?
The built-in gpuArray algorithms should perform well in most circumstances. After that, arrayfun and bsxfun will also perform well on the GPU. You can use CUDAKernel if necessary to handle situations where you still want more performance.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!