- the second "wait(gpu)" inside your tight loop is not needed and will be affecting results. Memory transfers from device to host (i.e. "gather") are always synchronized.
- You are measuring the speed of transferring data to/from the GPU (i.e. the speed of the PCI bus). This is not the same as the GPU memory bandwidth (as suggested by the question title), which is much, much higher (>90GB/sec for your GPU and even higher for a recent GPU).
- it is nearly impossible to accurately measure the transfer bandwidth from within MATLAB. What you are actually timing here is the time taken to allocate some space (on the GPU in the first case, in host memory for the second), to perform the data-transfer and to assign a MATLAB variable. These extra steps take some (hopefully small) amount of time that will reduce the results.
- some of the variability may come from other processes using the PCI bus. Running your OS in a highly stripped-down mode with no network etc. might help.
How to measure GPU memory bandwidth ?
18 views (last 30 days)
I have a TeslaC1060 with 4Gb of memory. I am running MatlabR2012b and I am using the following code to measure the memory bandwidth between host and device.
gpu = gpuDevice()
N=8192; data = rand(N,N); %
gdata = gpuArray(data); wait(gpu);
CPU2GPU(k) = N^2*8/1024^3/toc;
data2 = gather(gdata); wait(gpu);
GPU2CPU(k) = N^2*8/1024^3/toc;
I found less than 1.5 Gb/s from GPU to CPU and less than 3.0 Gb/s from CPU to GPU (averaging 100 values except the very first ones). 1) Why the values measured are so far from the expected 8 Gb/s? It turns out that the 100 values vary from one run to another by a factor almost 2. 2) Why the behavior of this code is not so reproductible?
Thanks for your help.
Ben Tordoff on 15 Apr 2013
you might like to have a look at the following article:
in those results, the achieved transfer bandwidth tops out at about 5.7GB/sec (send) and 4.0GB/sec (gather). Whilst I can't give you a definitive answer as to why your measured transfer rates are so low and unreliable, here are a couple of points to consider:
If you try the code from the article and still see much lower results, let me know. Note, however, that you are not really measuring your GPU here, you are simply measuring how busy your PCI bus is and how well MATLAB can throw data at it. It's an important measure, but it's not usually the most important one, so long as you do plenty of calculations with your data once you've put it on the GPU. If you want to know more about your GPU's calculation performance, you might like to take GPUBench for a spin: