Sum of squares profiling on GPU
1 view (last 30 days)
Show older comments
I was profiling some code that runs on my GPU and came across something rather puzzling that I haven't been able to sort out... maybe it has something to do with the way the profiler interacts with the GPU, so I also tried on the CPU and got very different results. Here is the code:
clear all
g = gpuArray.rand(600, 600, 400, 'single');
for i = 1:100
x = sum(g, 3)/400;
gSq = g.^2;
y = sum(gSq, 3)/400;
g = g+.01;
end
This code is just an example of the problem, not the actual code I am running, so don't try to wonder why anybody would do this...
On the GPU the profiler shows basically ALL of the time is spent on the line
y = sum(gSq, 3)/400;
On the CPU, the profiler shows most of the time being spent on
g = g+.01;
and the remainder of the time is evenly distributed among the other lines.
Why is summing the gSq array so expensive on the GPU relative to summing the x array? They are the same size... I don't think it is a memory issue since my GPU has 4GB memory and almost 3GB is still available with g, x, gSq and y in memory.
Any ideas?
3 Comments
Answers (1)
See Also
Categories
Find more on GPU Computing in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!