Unexpected speed decrease of 2D Fourier Transform on GPU when iFFTed

6 views (last 30 days)
I am applying a first FFT2 on a stack of images, croping a part of it, and iFFT2 this part:
For example on GPU: FFT2(1920*1240*30 (single) ) -> crop to 320*207*30 (single) -> iFFT2(320*207*30 (single) )
1920/6=320
1240/6=207
Here you may observe the time of execution, normalized to the number of single data processed, for each function:
timeexeeval.png
Note that the yellow line (FFT2+crop1/6+iFFT2) is more than an order of magnitude slower than the purple line which has 36 more data to process with iFFT2 !
Any idea on what is happening here?
Here is the script I have used:
clear
n=10;
cx=1920;
cy=1240;
FPT=2:5:50;
fpt=size(FPT,2);
b=zeros(1,fpt);
for kk=1:8
for ii=1:fpt
ii
I=gpuArray(single(rand(cy,cx,FPT(1,ii))));
Ia=gpuArray(single(rand(round(cy/6),round(cx/6),FPT(1,ii))+1i.*rand(round(cy/6),round(cx/6),FPT(1,ii))));
mask=zeros(cy,cx,FPT(1,ii));
mask(round(cy/2)-round(cy/12):round(cy/2)+round(cy/12),round(cx/2)-round(cx/12):round(cx/2)+round(cx/12))...
=(ones(size(round(cy/2)-round(cy/12):round(cy/2)+round(cy/12),2),size(round(cx/2)-round(cx/12):round(cx/2)+round(cx/12),2)));
mask=gpuArray(single(mask));
tic
for jj=1:n
switch kk
case 1
tic
B=fft2(I);
case 2
tic
B=fft2(I);
C=B(((cy/2)-round(cy/12)):((cy/2)+round(cy/12)),...
((cx/2)-round(cx/12)):((cx/2)+round(cx/12)),:);
case 3
tic
B=fft2(I);
C=B(((cy/2)-round(cy/12)):((cy/2)+round(cy/12)),...
((cx/2)-round(cx/12)):((cx/2)+round(cx/12)),:);
D=ifft2(C);
case 4
tic
B=fft2(I);
C=ifft2(B);
case 5
tic
B=fft2(I);
C=B.*mask;
D=ifft2(C);
case 6
tic
B=fft2(I);
C=B.*mask;
D=ifft2(C);
E1=imresize(abs(D),1/6);
E2=imresize(angle(D),1/6);
case 7
tic
C=fft2(I);
B=ifft2(Ia);
case 8
tic
B=ifft2(Ia);
end
end
b(1,ii)=toc/n; % b is the time of execution normalized to
%the amount of data and the number of time a case has been evaluated
end
hold on
plot(b)
clear A B C D I E1 E2
end
b is the variable plotted in the above graphic.
My graphic card is the GeForce RTX 2080 Ti.
Any help will be appreciated.
Thanks,
Tual

Accepted Answer

Joss Knight
Joss Knight on 8 Jun 2019
I modified your code inserting wait(gpuDevice) before each tic and toc and got a much more sensible graph:
Capture.PNG
The GPU runs asynchronously so tic and toc often don't work very well. See the documentation here.

More Answers (1)

Bruno Luong
Bruno Luong on 3 Jun 2019
If you want a fast FFT, make your data length power of 2, or product of small integers.
166 is bad since the prime factor is 2 * 83..
  2 Comments
Tutu
Tutu on 3 Jun 2019
Thank you for the answer, I knew this though. But, besides, on GPU this doesn't make a big difference if you execute an important amount of data: whether or not you use data with 2^n dimension size.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!