Help with efficient collection of sub-matrices on GPU
1 view (last 30 days)
Show older comments
The problem that I have encountered is that the CPU is much faster than the GPU at grabbing sub-matrices from a larger matrix. The rest of this project is very computation heavy and runs much more efficiently on GPU but this efficiency is lost in the transferring of data to the GPU.
For each object in a frame I need to collect multiple image patches (sub-matrices) and do things with these patches.
I have created a function below to simulate my issue collecting the patches by timing the difference. If you don't need to see the whole function just scroll to the results and summary.
%Timing the CPU and GPU for the retrieval of image patches at random
function cpuVSgpu(p,obs)
%the number of patches I want to retrieve
numPatches = p;
numObjects = obs;
patchSize = 71;
halfPatch = (patchSize-1)/2;
res = [720 1280];
%load image on CPU and GPU and add padding so that no patch will go out of
I = padarray(rgb2gray(imread('mBallTracking\Frame_1.png')),[35 35],255);
gpuI = gpuArray(padarray(rgb2gray(imread('mBallTracking\Frame_1.png')),[35 35],255));
%generate random coordinates on CPU and GPU and pad them
coords = zeros(numPatches, 2, numObjects);
for i = 1:numObjects
Y = randi([1,res(1,1)], numPatches, 1)+halfPatch;
X = randi([1,res(1,2)], numPatches, 1)+halfPatch;
coords(:,:,i) = [Y X];
%allocate for number of patches
patches = zeros(patchSize, patchSize, numPatches, 'uint8');
gpuPatches = zeros(patchSize, patchSize, numPatches, 'uint8','gpuArray');
t = nan(2,1);
t(1) = timeit(@() CPU);
t(2) = gputimeit(@() GPU);
row = {'CPU Time:', 'GPU Time:'};
t = table(t, 'RowNames',row);
function CPU
%get coordinates for each object
for o = 1:numObjects
Y = coords(:,1,o);
X = coords(:,2,o);
%get patches with coords as center point
for n = 1:numPatches;
patches(:,:,n) = I(Y(n)-halfPatch:Y(n)+halfPatch,X(n)-halfPatch:X(n)+halfPatch);
function GPU
%get coordinates for each object
for o = 1:numObjects
Y = coords(:,1,o);
X = coords(:,2,o);
%get patches with coords as center point
for n = 1:numPatches;
gpuPatches(:,:,n) = gpuI(Y(n)-halfPatch:Y(n)+halfPatch,X(n)-halfPatch:X(n)+halfPatch);
Here are the results for testing the collection of 100 patches with 5, 50, and 500 objects:
>> cpuVSgpu(100,5)
timeCPU: 0.004164
timeGPU: 0.094262
>> cpuVSgpu(100,50)
timeCPU: 0.041856
timeGPU: 0.93428
>> cpuVSgpu(100,500)
timeCPU: 0.41799
timeGPU: 9.763
So essentially this is slow on GPU:
for o = 1:numObjects
Y = coords(:,1,o);
X = coords(:,2,o);
%get patches with coords as center point
for n = 1:numPatches;
Patches(:,:,n) = I(Y(n)-halfPatch:Y(n)+halfPatch,X(n)-halfPatch:X(n)+halfPatch);
Is there way to do this faster?
If done on CPU after collecting the patches I have to send them to GPU for computation. Sending data to GPU every frame kills performance and doing the computations on CPU kills performance. I could use some help as I'm stuck between a rock and a hard place here.
Thanks in advance!
Answers (1)
Joss Knight
on 14 Jul 2016
You need to vectorize your indexing, then it will be efficient on the GPU.
[offsetX, offsetY] = meshgrid(1:(2*halfPatch+1)) - halfPatch;
Y = bsxfun(@plus, offsetY, reshape(coords(:,1,:), 1,1,numPatches,numObjects));
X = bsxfun(@plus, offsetX, reshape(coords(:,2,:), 1,1,numPatches,numObjects));
Patches = reshape( I(Y(:),X(:)), size(X) );
This will give an M-by-N-by-numPatches-by-numObjects array of patches.
1 Comment
See Also
Find more on GPU Computing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!