Using GPU on multiple nested loops

4 views (last 30 days)
Jhinhwan Lee
Jhinhwan Lee on 26 Aug 2019
Answered: Raunak Gupta on 30 Aug 2019
The following code is slow for large ncnt (typically >2000)
and I want to use my GPU for the outermost (iplane) loop.
Can you give me an hint? (I have an NVIDIA RTX 8000.)
nxs=101;
nys=101;
nzs=101;
ncnt=100;
xnmin=-1.0;
xnmax= 1.0;
ynmin=-1.0;
ynmax= 1.0;
znmin=-1.0;
znmax= 1.0;
coefftmp=complex(rand(1,ncnt));
igalltmp=rand(3,ncnt);
vktmp=rand(1,3);
wfrtmp=complex(zeros(nxs,nys,nzs));
tic
for iplane=1:ncnt % GPU loop
ee=exp(2.*pi*complex(0.,1.));
vkg=vktmp+double(igalltmp(:,iplane)');
ekx=ee^vkg(1);
eky=ee^vkg(2);
ekz=ee^vkg(3);
coefft=coefftmp(iplane);
for iz=1:nzs
z=znmin+(znmax-znmin)*double(iz-1)/double(nzs-1);
ekzz=ekz^z;
for iy=1:nys
y=ynmin+(ynmax-ynmin)*double(iy-1)/double(nys-1);
ekyy=eky^y;
for ix=1:nxs
x=xnmin+(xnmax-xnmin)*double(ix-1)/double(nxs-1);
ekxx=ekx^x;
wfrtmp(ix,iy,iz)=wfrtmp(ix,iy,iz)+coefft*ekxx*ekyy*ekzz;
end
end
end
end
wfr(ispin,:,:,:)=wfrtmp/sqrt(Vcell);
toc
  4 Comments
Walter Roberson
Walter Roberson on 26 Aug 2019
If you want to use GPU, you are going to have to rewrite your code to be vectorized.
I suggest you consider
zvec = linspace(znmin, znmax, nzs);
yvec = linspace(ynmin, ynmax, nys);
xvec = linspace(xnmin, xnmax, nxs);
[X, Y, Z] = ndgrid(xvec, yvec, zvec);
before any looping. After that you can do things like
coeff .* ekz.^Z .* eky.^Y .* ekx.^X
Jhinhwan Lee
Jhinhwan Lee on 26 Aug 2019
Thanks! I did something basically the same and it is more than ten times faster now.
I also found including the X, Y and Z in the argument of the exp function slightly better: In the cases of kx=1, 2, 3, ... ekx=exp(2.*pi*complex(0.,1.)*kx)=1 and ekx^x==1 no matter what x is, while exp(2.*pi*complex(0.,1.)*kx*x) depends on x (unless k=0) as expected.
coeff*exp(2.*pi*complex(0.,1.)*(kx*X+ky*Y+kz*Z))

Sign in to comment.

Answers (1)

Raunak Gupta
Raunak Gupta on 30 Aug 2019
Hi,
For speeding up the code you need to first vectorize the three loops inside the main loop as they are independent of each other. As mentioned in the comments you can use linspace and ndgrid for doing exponentiation for all three variables independently.
The above part only vectorizes the code but to actually use GPU you can create the initial arrays using gpuArray. This may also significantly fasten up the code. The function that you have used inside the code is supported for gpuArray but if you want to use any specific function you can check about all the supported function here.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!