Initializing GPU on multiple workers cause an unknown error
Show older comments
I've noticed that the following simple code results in an weird error, if I use R2016b on a machine with two GTX1080Ti and one K2200 :
% start a _new_ Matlab instance first!
parpool(16);
fetchOutputs( parfevalOnAll(@() gather(gpuArray(1)),1) )
The error message I get:
Error using parallel.FevalOnAllFuture/fetchOutputs (line 69)
One or more futures resulted in an error.
Caused by:
Error using parallel.internal.pool.deserialize>@()gather(gpuArray(1))
An unexpected error occurred during CUDA execution. The CUDA error was:
unknown error
<-- repeated multiple times -->
After that, all GPU functionality gets completely broken:
>> a=gpuArray(1)
Error using gpuArray
An unexpected error occurred during CUDA execution. The CUDA error was:
unknown error
Even re-starting Matlab won't help. The fix is to clear the CUDA JIT cache folder, "%USERPROFILE%\AppData\Roaming\NVIDIA\ComputeCache".
However, the following "longer pre-initialization" works OK for me:
% start a _new_ Matlab instance first and clear CUDA JIT cache if there was an error.
gpuDevice(1)
gather(gpuArray(1))
parpool();
fetchOutputs( parfevalOnAll(@() gpuDevice(1),1) )
fetchOutputs(parfevalOnAll(@() gather(gpuArray(1)),1))
AFAIU:
- Matlab R2016b that I use here, was designed for CUDA 7.5, and there are no binaries for CUDA Compute Capability 6.1.
- That's why Matlab uses CUDA JIT to recompile a ton (~400 MB) of stuff when user calls any gpu-related function the first time. (Which also causes many " gpuDevice() is slow " questions.
- There's something wrong with that JIT, if combined with parpool (a race condition?).
My system is: Windows 10, CUDA 8.0 (cuda_8.0.61_win10) with patch 2 (cuda_8.0.61.2_windows), nvidia driver r384.94. The CUDA_CACHE_MAXSIZE environment variable is set to 2147483647.
My questions:
- Is my "longer pre-initialization" workaround actually "safe"? Is it a real workaround for those "race condition"? Or is it as good as the original (might be stable on my specific system, but is likely to fail on some other)? Assuming I have to stay with R2016b for now, targeting CUDA 8.0 and Pascal GPU (building a dll).
- Same code works OK in R2017b-R2018a and above. Is that just because they don't use CUDA JIT here? Or is the real underlying issue actually fixed? (I don't have a device with compute capability >6.x at hand, so I'm unable to check that.)R2017a behaves like R2016b here, even though it claims CUDA 8.0 support - it still writes something (but just ~40MB) to CUDA JIT cache, fails in test #1 and works in test #2.
10 Comments
Joss Knight
on 27 Jun 2018
JITted code from R2016b has been known in the past not to work on a 1080i due to bugs in the driver optimizer. However, this may not be your problem because you can't even seem to select the device. Your best place to start is to ensure that your code runs on each of your GPUs. Select each GPU in turn (on your client MATLAB) and ensure you can use it.
Secondly, you are trying to use each GPU from multiple processes. Check whether everything works when you only have three workers in your pool - the same as the number of GPUs.
Finally - why are you running GPU code on a pool of 16 workers? Generally, if most of your computation is on the GPU, you should not have more workers than you have GPUs.
Igor Varfolomeev
on 28 Jun 2018
Joss Knight
on 28 Jun 2018
Can you check that this problem is only with your Pascal cards? Exclude one of the Pascal cards from the pool and try again.
parpool(2);
spmd
gpuDevice(labindex+1);
gather(gpuArray(1));
end
Now try only the two Pascal cards and see if the problem recurs. If this is what is happening perhaps there is some issue with two processes reading from the JIT cache. However, this does seem like an extremely unusual thing for NVIDIA to have got wrong - the system is designed to have multiple processes using the same GPU.
I don't know why your JIT cache would be changing after it has been populated, since you are not generating any new kernels, just JITting the ones in the CUDA libraries and PCT libraries.
Oh, and one final thing to try is running three MATLAB instances (instead of a pool), use different cards, and check everything works. Ultimately, that's all a pool is, it's a communicating set of MATLAB instances.
Igor Varfolomeev
on 29 Jun 2018
Edited: Igor Varfolomeev
on 29 Jun 2018
Joss Knight
on 2 Jul 2018
Have you tried clearing the cache, then running gpuArray(1) on one Pascal card. Wait for the JIT cache to be populated fully, then open a pool and run on multiple cards.
I'd be very surprised if there's a general issue, since we regularly run multi-GPU code on dual Pascal cards. Nonetheless, I'll requisition one and check your code in 16b.
Is this Windows, and is one of your Pascal cards driving the display? You should switch to driving the display from the K2200 in this case, that may help.
Igor Varfolomeev
on 3 Jul 2018
Joss Knight
on 3 Jul 2018
I don't really have any more ideas I'm afraid, and since this works fine in later versions of MATLAB it's difficult to say much more than "upgrade MATLAB". You could check that it is indeed just the JIT by running a later MATLAB and setting the environment variable CUDA_FORCE_PTX_JIT to 1. I suspect it is but really, CUDA 7.5 is pretty old now and NVIDIA don't worry themselves too much about supporting the JIT pipeline for older cards, so there could be an issue in your driver that won't be fixed, or will never work because the PTX itself is faulty.
Try downloading a newer driver, then try downloading an OLDER driver. The next step would be to get some standalone CUDA code running by compiling and running some of the CUDA toolkit samples. You would run them simultaneously on different devices. Then if it still broke you could contact NVIDIA to get it looked into.
Igor Varfolomeev
on 3 Jul 2018
Joss Knight
on 4 Jul 2018
I had a colleague check their dual GTX 1080 system and they saw no issues, with 16b or with the current version with a forced JIT.
Sounds interesting... But this does not give me the same behaviour - the ComputeCache is still almost empty after running those commands - few KB only. It looks like files are being added and instantly erased. Hmm... Could you please advice - am I doing something wrong here? Were you able to make it populate the ComputeCache?
This works for me but ... possibly only when your card's architecture is the maximum supported or higher, because if it were lower there would be no compatible PTX in the libraries. So you'll need to run R2017a or R2017b for your Pascal card.
It would be good to establish why upgrading MATLAB is not an option for you.
Igor Varfolomeev
on 8 Jul 2018
Edited: Igor Varfolomeev
on 8 Jul 2018
Accepted Answer
More Answers (0)
Categories
Find more on GPU Computing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!