Help with efficient collection of sub-matrices on GPU

1 view (last 30 days)
The problem that I have encountered is that the CPU is much faster than the GPU at grabbing sub-matrices from a larger matrix. The rest of this project is very computation heavy and runs much more efficiently on GPU but this efficiency is lost in the transferring of data to the GPU.
For each object in a frame I need to collect multiple image patches (sub-matrices) and do things with these patches.
I have created a function below to simulate my issue collecting the patches by timing the difference. If you don't need to see the whole function just scroll to the results and summary.
FUNCTION
%Timing the CPU and GPU for the retrieval of image patches at random
%locations
function cpuVSgpu(p,obs)
%the number of patches I want to retrieve
numPatches = p;
numObjects = obs;
patchSize = 71;
halfPatch = (patchSize-1)/2;
%resolution
res = [720 1280];
%load image on CPU and GPU and add padding so that no patch will go out of
%bounds
I = padarray(rgb2gray(imread('mBallTracking\Frame_1.png')),[35 35],255);
gpuI = gpuArray(padarray(rgb2gray(imread('mBallTracking\Frame_1.png')),[35 35],255));
%generate random coordinates on CPU and GPU and pad them
coords = zeros(numPatches, 2, numObjects);
for i = 1:numObjects
Y = randi([1,res(1,1)], numPatches, 1)+halfPatch;
X = randi([1,res(1,2)], numPatches, 1)+halfPatch;
coords(:,:,i) = [Y X];
end
%allocate for number of patches
patches = zeros(patchSize, patchSize, numPatches, 'uint8');
gpuPatches = zeros(patchSize, patchSize, numPatches, 'uint8','gpuArray');
%timing
t = nan(2,1);
t(1) = timeit(@() CPU);
t(2) = gputimeit(@() GPU);
row = {'CPU Time:', 'GPU Time:'};
t = table(t, 'RowNames',row);
function CPU
%get coordinates for each object
for o = 1:numObjects
Y = coords(:,1,o);
X = coords(:,2,o);
%get patches with coords as center point
for n = 1:numPatches;
patches(:,:,n) = I(Y(n)-halfPatch:Y(n)+halfPatch,X(n)-halfPatch:X(n)+halfPatch);
end
end
end
function GPU
%get coordinates for each object
for o = 1:numObjects
Y = coords(:,1,o);
X = coords(:,2,o);
%get patches with coords as center point
for n = 1:numPatches;
gpuPatches(:,:,n) = gpuI(Y(n)-halfPatch:Y(n)+halfPatch,X(n)-halfPatch:X(n)+halfPatch);
end
end
end
disp(t);
end
RESULTS
Here are the results for testing the collection of 100 patches with 5, 50, and 500 objects:
>> cpuVSgpu(100,5)
t
________
timeCPU: 0.004164
timeGPU: 0.094262
>> cpuVSgpu(100,50)
t
________
timeCPU: 0.041856
timeGPU: 0.93428
>> cpuVSgpu(100,500)
t
_______
timeCPU: 0.41799
timeGPU: 9.763
SUMMARY
So essentially this is slow on GPU:
for o = 1:numObjects
Y = coords(:,1,o);
X = coords(:,2,o);
%get patches with coords as center point
for n = 1:numPatches;
Patches(:,:,n) = I(Y(n)-halfPatch:Y(n)+halfPatch,X(n)-halfPatch:X(n)+halfPatch);
end
end
Is there way to do this faster?
If done on CPU after collecting the patches I have to send them to GPU for computation. Sending data to GPU every frame kills performance and doing the computations on CPU kills performance. I could use some help as I'm stuck between a rock and a hard place here.
Thanks in advance!

Answers (1)

Joss Knight
Joss Knight on 14 Jul 2016
You need to vectorize your indexing, then it will be efficient on the GPU.
[offsetX, offsetY] = meshgrid(1:(2*halfPatch+1)) - halfPatch;
Y = bsxfun(@plus, offsetY, reshape(coords(:,1,:), 1,1,numPatches,numObjects));
X = bsxfun(@plus, offsetX, reshape(coords(:,2,:), 1,1,numPatches,numObjects));
Patches = reshape( I(Y(:),X(:)), size(X) );
This will give an M-by-N-by-numPatches-by-numObjects array of patches.
  1 Comment
Christopher Desmond
Christopher Desmond on 15 Jul 2016
Edited: Christopher Desmond on 18 Jul 2016
This code doesn't work though I think I might understand what you are going for.
Offset isn't what it should be and I'm not quite sure how to make it correct but meshgrid can't be assigned to multiple values and that may be the problem.
If I'm not mistaken for a 5x5 matrix offsetX should be:
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
-2 -1 0 1 2
And offsetY should be:
-2 -2 -2 -2 -2
-1 -1 -1 -1 -1
0 0 0 0 0
1 1 1 1 1
2 2 2 2 2
Even when given the same offset though, it seemed to output far too large of a matrix.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!