MATLAB GPU: arrayfun with indexing

Hi
I am new to MATLAB GPU computing and have made some initial tests. Now I am looking to parallelize a the following code.
for i=1:n ;where n~1'000'000 and a, b,c of size ~300'000x1
currindices = indices(24,i);
a(currindices ) = a(currindices ) + A(24x24)*(b(currindices )+B(24x24)*c(currindices ));
end
In a test I parallelized this code without any of the indices by using arrayfun and it worked well. Meaning just having the following code in an function that was called by arrayfun:
for i=1:n
a=a+A*(b+B*c)
end
I wonder how to deal with the indexing of the vectors and whether arrayfun still makes sense. The matrices A and B are constant. I read that indexing is rather slow on a GPU.
What would be the best way to parallelize the above code?
Thanks for any help. This whole paralellization does not come natural to me yet.
BR

6 Comments

Walter Roberson
Walter Roberson on 22 Oct 2017
Edited: Walter Roberson on 24 Oct 2017
? currindices appears to be unused before you assign to it.
sorry, was a mistake. indexing should happen to currindices. fixed the code in the sample
I'm not sure what language you've written your code in so it's difficult to interpret. What is A(24x24)? And if this were MATLAB code then indices(24,i) would be a scalar. But then your algebra doesn't make sense.
it wasn't meant to be real code. it is just to show that A is of size 24x24 and that for currindices I read 24 values. so currindices is currindices(:,i) in MATLAB code and the multiplication with A and B is simply that.
for i=1:n %;where n~1'000'000 and a, b,c of size ~300'000x1
currindices = indices(:,i);
a(currindices ) = a(currindices ) + A*(b(currindices )+B*c(currindices ));
end
well, one of the things I learnt anyway is that I have to use pagefun. the problem is still indexing.
however the main feeling i have is that anyway I have to rewrite the math for an optimal parallelization.
I don't think you need pagefun. Can't you just do this with indexing and matrix multiplication? It seems indices is the correct shape, namely 24-by-n. So b(indices) and c(indices) return 24-by-n, the multiplies return 24-by-n, and the addition works.
a(indices) = a(indices) + A * (b(indices) + B * c(indices));
If the indices repeat this may not work as you intended, because some elements of a will get one of the answers and not another. You might have to use accumarray in that case.
result = a(indices) + A * (b(indices) + B * c(indices));
a = accumarray(result, indices(:), size(a));
got it. at least on CPU the multiplication is 10 times faster than the for loop. anyway I know need to rewrite the code and see how that could work on a GPU.
thanks!

Sign in to comment.

Answers (0)

Asked:

on 22 Oct 2017

Commented:

on 31 Oct 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!