Slice into gpuArray and perform functions on the GPU with arrayfun

3 views (last 30 days)
I would like to know how I can index into a given matrix to make pairwise combinations of column-vectors, and perform operations on these vectors - all on the GPU. So consider the simple function below:
function out = sum2Vecs(in1,in2) %in1 and in2 are (n x 1) vectors.
out = sum(in1,1) + sum(in2,1); %Output is a scalar "double".
end
Quick example: an array such as
fullMatrix = rand(3000,100);
Now I choose all pairwise column-vector combinations of "fullMatrix":
idxArray = nchoosek(1:100,2); %All possible pairwise index combinations of "fullMatrix".
nCombinations = length(idxArray);
And a simple for-loop performs the "sum2Vecs" function on each combination of two-column vectors:
for idx = 1 : nCombinations,
outArray(idx) = sum2Vecs( fullMatrix(:,idxArray(idx,1)) , fullMatrix(:,idxArray(idx,2)) );
end
Also, a parfor-loop with slicing works fine:
parfor idx = 1 : nCombinations,
in1 = fullMatrix(:,idxArray(idx,1));
in2 = fullMatrix(:,idxArray(idx,2));
outArray(idx) = sum2Vecs(in1,in2);
end
My goal is to be able to perform this loop on the GPU using e.g. "arrayfun". But I am relatively inexperienced with this, so I would appreciate any helpful pointers. What I am particularly interested in learning is how to efficiently index into an array like "fullMatrix" and send parts of it to each GPU worker efficiently.
Thanks very much. Hamad.

Answers (1)

Matt J
Matt J on 11 Jan 2015
Edited: Matt J on 11 Jan 2015
In the generality that you've described, that kind of computation doesn't look like the kind of thing that's well-suited to the GPU . The GPU is for situations when you have lots of parallel tasks involving small chunks of data. The chunks in your example, two 3000x1 vectors, wouldn't likely be small enough unless the operation can be subdivided further.
For that specific example, I would probably try to vectorize on the GPU as follows,
idxArray = gpuArray( nchoosek(1:100,2).' ) ;
A= gpuArray(fullMatrix);
[m,n]=size(A);
outArray=sum( reshape(A(:,idxArray),2*m ,[]), 1 );
  4 Comments
Joss Knight
Joss Knight on 23 Feb 2015
arrayfun can take a user-defined function, as long as that function carries out scalar operations. You can also index into arrays in that function as long as the array is passed in as an upvalue - see for instance here, the Mandelbrot example on this page and the Monte Carlo example here.
You need to remember that GPU cores are not like parallel workers. They cannot perform complex vector operations. Taken together, they perform complex vector operations, but not individually. In PCT a large number of complex algorithms have been implemented in such a way as to take maximum advantage of the GPU. If you are having trouble formulating your problem in a data-parallel way, then post your real code and we can have a look at whether it is inherently parallelisable. The example you gave - summing vectors - is easily vectorizable as Matt showed above.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!