Why does my cuda code run slower on linux than on windows?
4 views (last 30 days)
Show older comments
I am doing 3D non-local means for image stack denoising.
The codes in principles work like this: 1) copy data from host memory to device memory; 2) do the necessary calculation completely in GPU; 3) copy the results back from device to host memory. My codes do not use any built-in functions from MatLab, but are only interfaced to mexFunction.
My codes (exactly the same codes) work fine under both windows and linux system. The final results are quite similar. I mean "smilar", but not identical, just because I think the differences in every pixels are probally from the trunking of floating numbers, for example, the pixel values are about 1 ~ 5000, and the difference is between 1e-6 ~ 1e-5.
However, the time used to calculate same dataset is different on Windows 10 and Ubuntu Linux LTS 20.04. On linux the cuda/nVidia GPU calculation time is double the time on Windows. On the contrary, the same C codes executed with CPU on Ubuntu Linux is a little bit faster than on Windows, roughly by 10 ~ 30% faster. I understand that C codes with CPU on Linux may be faster, because I think Windows does something probally with a little more abstraction.
Dose anyone have similar experiences like that? Is it possible to solve the problem of cuda speed difference?
The GPU is not in "TCC" mode. Under both Windows and Linux, the GPU is still used for displaying GUI, because this GPU is the only graphic device.
Thanks!
Qinghai
4 Comments
Hamza Butt
on 17 Dec 2021
Hi Qinghai,
Is it safe to assume that you are dual booting the machine and the hardware is the same? Can you let me know what you are using to benchmark execution times? The best way to time operations on the GPU is to use gputimeit, for example:
A = gpuArray.ones(1000);
gputimeit( @() A*A)
Also, what is the absolute difference in timings for the GPU benchmarks between the two? Is it large enough to not be considered noise?
Because this is mex code, it might be slightly tricky to figure out what could be happening, but we can try a few things nonetheless. I assume the mex creation script is the same for both OS (which would remove any compilation argument differences like optimisation flags) and you are using mexcuda. It may be that some background process is using the GPU on Linux. You can check GPU utilization from outside MATLAB by running the command:
nvidia-smi
You can then observe the output to see what processes are using the GPU. Are all of the entries in the table expected?
You could also try some profiling with Nsight Compute. Perhaps the profiler shows that a CUDA library call takes longer on Linux than Windows. I can imagine that drivers can cause such discrepancies in even seemingly simple operations like memory allocations.
Nsight Compute comes with CUDA but a quicker (albeit primitive) way of checking things could also be to first compare a benchmark of empty mex files between the OSes. Then, gradually add CUDA code to the mex file and benchmark until you start seeing measurable differences. This will help you isolate the problematic function (if there is one).
Debugging performance issues can be tricky but I hope this helps.
Answers (0)
See Also
Categories
Find more on GPU Computing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!