Reproducibility convolutional neural network training with gpu
You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Show older comments
2 votes
Hello,
I am training a CNN using my local GPU (to speed up training) for classification problems and would like to try different parameterizations. To avoid the variability effects due to different data and/or weights initialization I am resetting the random seeds each time before training:
% Initialize random seed (thus same dataset on same architecture would lead
% to predictable result)
rng(0);
%parallel.gpu.rng(0, 'CombRecursive');
randStream = parallel.gpu.RandStream('CombRecursive', 'Seed', 0);
parallel.gpu.RandStream.setGlobalStream(randStream);
% Train the CNN network
net = trainNetwork(TR.data,TR.reference,layers,options);
The problem is that when using GPU I am getting different results on each execution, even if initializing the GPU random seed to the same value. Strange thing is if I use CPU instead, then I do get the reproducible results. I am doing something wrong with GPU random seed initialization? Is there a know problem for this situation or something I am missing?
Thanks beforehand.
PS: I am using Matlab R2017b
Accepted Answer
Joss Knight
on 20 Sep 2018
1 vote
Use of the GPU has non-deterministic behaviour. You cannot guarantee identical results when training your network, because it depends on the whims of floating point precision and parallel computations of the form (a + b) + c ~= a + (b + c).
Most of our GPU algorithms are in fact deterministic but a few are not, for instance, backward convolution.
14 Comments
Diego Alvarez Estevez
on 21 Sep 2018
Very interesting and good to know! Thanks you
Eric
on 6 Jun 2020
I am encountering the same issue and I am very surprised and I should say very disappointed by Mathworks: as a Matlab user since version 3.5, I cannot imagine that people developping software can accept their code not to be reproductible? It's a jok! Mathworks has to correct this bug or to propose a solution to customers: what about moving single precision GPU code in double precision as this is now available ? (and you claim it is coming from whims of floating point precision)
Joss Knight
on 8 Jun 2020
Can you let us know what non-deterministic behaviour it is that you're experiencing, specifically? As far as I'm aware deep learning training is the only place this happens, and that particular behaviour is true across all the deep learning frameworks because they use the same underlying NVIDIA library that has this behaviour. Maybe there is some randomness in your particular application that we're missing?
Hello,
@Joss Knight (or any other Matlab Staff Member), my colleague reffered to this Link and said that it is now possible to acchieve deterministic results in TensorFlow for Deep Learning algorithms on the GPU.
Is this something that Matlab will be / is able to implement in the near future?
Thanks,
Barry
Joss Knight
on 3 Sep 2020
Edited: Joss Knight
on 3 Sep 2020
I believe we have a plan to add support for deterministic training in a future release. As I say, as far as I know backward convolution and backward max-pooling are the only sources of indeterminism (other than certain kinds of parallel training) which means the problem is limited to training a deep network. If you know of other sources let me know.
Dammak
on 6 Jan 2021
@Joss Knight Repeatability and reporducibility are extremely important. How can someone even consider using MATLAB deep learning software for serious science if repeating the experiment yields slightly different results every time? I hope the plans to add deterministic behaviour to future releases happens sooner rather than later. It's unfortunate that this was not made a priority in the 2021 release
Joss Knight
on 7 Jan 2021
People use TensorFlow and pyTorch all the time for serious science and they have the exact same issue so I guess people don't consider it that bad a problem. You should only see this indeterminism during training which is typically initialized with random numbers anyway.
Aled Catherall
on 4 Feb 2022
Edited: Aled Catherall
on 4 Feb 2022
@Joss Knight - Has progress been made on fixing the issue? Lack of deterministic and repeateable training is proving to be quite a problem for some applications. For example, when I make a small change to the input data or the network, I want to know if differences in my results are due to the changes I have made and not the vagaries of non-deterministic floating point arithmetic. An update on this issue would be welcome, thanks.
Also, please note that you shouldn't be using the term "random rumbers" - but rather pseudorandom numbers, since they are generated by Matlab from a deterministic algorithm and not a stochastic process (like nuclear decay)
Joss Knight
on 4 Feb 2022
We are working on a solution and will let you know when it lands!
Van Vy
on 17 Mar 2022
Joss Knight: I'm looking forward to seeing it soon. Please hurry
mkoh
on 30 Jan 2023
@Joss Knight, can you perhaps link some references that say that backward convolution and backward max pooling are non-deterministic?
Hamza
on 20 Nov 2023
@Joss Knight have you found a solution?
Mesho
on 24 May 2024
I am also facing the same problem
More Answers (0)
Categories
Find more on Parallel and Cloud in Help Center and File Exchange
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)