Why do I get the error "CUDNN_STATUS_EXECUTION_FAILED" when training a neural network on a GPU on a server?

Question

MathWorks Support Team on 8 Nov 2019

1
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/490845-why-do-i-get-the-error-cudnn_status_execution_failed-when-training-a-neural-network-on-a-gpu-on-a

Edited: MathWorks Support Team on 25 Sep 2022

When training a neural network on a GPU on a server, it usually fails after some time with the following error message:

Error using trainNetwork (line 154)
Unexpected error calling cuDNN: CUDNN_STATUS_EXECUTION_FAILED.
Caused by:
Error using nnet.internal.cnngpu.lstmForwardTrain
Unexpected error calling cuDNN: CUDNN_STATUS_EXECUTION_FAILED.

This generally happens when someone else launches another program on the same GPU.

Sign in to answer this question.

Answer 1

MathWorks Support Team on 25 Jul 2022

1
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/490845-why-do-i-get-the-error-cudnn_status_execution_failed-when-training-a-neural-network-on-a-gpu-on-a#answer_401305

Edited: MathWorks Support Team on 25 Sep 2022

In general, it is not a good idea to share the GPU for computations across different programs or users. This will very likely cause kernel execution timeouts, memory issues and other failures.

Please try to change "Compute Mode" in the GPU to "Exclusive Mode", so that no other process can grab the GPU while MATLAB is performing computations. Please see the following link for more information:

https://developer.download.nvidia.com/compute/DCGM/docs/nvidia-smi-367.38.pdf

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Why do I get the error "CUDNN_STATUS_EXECUTION_FAILED" when training a neural network on a GPU on a server?

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Why do I get the error "CUDNN_STA​TUS_EXECUT​ION_FAILED​" when training a neural network on a GPU on a server?

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Why do I get the error "CUDNN_STATUS_EXECUTION_FAILED" when training a neural network on a GPU on a server?

0 Comments
Show -2 older commentsHide -2 older comments