MDCE Causing Blue Screen of Death (Clock Watchdog Timeout)

9 views (last 30 days)
We have a 24 core computer that has a MDCE license on it. Currently I have 23 workers set for the machine. I am able to successfuly submit complex simulink runs to the machine and retrieve the data when they finish. However, once I start submitting enough jobs where some are in the queue the machine crashes given a BSoD error of "Clock Watchdog Timeout". I am not entirely sure how many jobs causes the error. Once I had 24 jobs going and they all finished without a problem.
  2 Comments
Ming Yue
Ming Yue on 7 Dec 2018
It would be good to know whether the same crash happens without Simulink. To diagnose, you can submit some non-Simulink jobs (enough to have some jobs in the queue). The jobs could just be some long MATLAB function calls.
Also you could investigate the MDCE log file to look for signs out-of-memory errors. If you are using Windows, the log files are located here:
<TEMP>\MDCE\Log where <TEMP> refers to the system TEMP variable. By default, it is in the directory: C:\TEMP\MDCE\Log
Since you are using one computer with 24 cores, parallel computing could be done with PCT as well. Would these jobs crash if you use PCT instead? For PCT, you need to select "local" profile when you create the cluster:https://www.mathworks.com/help/distcomp/submit.html. Then the steps to submit job would be the same as submitting to MDCS.
Jason
Jason on 12 Dec 2018
Ming Yue,
Based on your suggestion of looking at the MDCE log files I found that the worker logs were updating every 30 seconds. I realized this was due to the fact that I had the "admincenter" updating every 30 seconds. I decided to turn off automatic updates of the admincenter and senta bunch of sim runs to the workers. They all finished with no issues and I was able to retrived the data. It sounds like constantly pinging the workers is not a good idea and can lead to the blue screen error. I haven't had a problem since I turned off updating in the admincenter. Thanks for your reply!

Sign in to comment.

Accepted Answer

Jason
Jason on 12 Dec 2018
Solved. In the "admincenter" turn off automatic updates. No more "Clock Watchdog Timeout" blue screen of death errors.

More Answers (0)

Categories

Find more on Startup and Shutdown in Help Center and File Exchange

Products


Release

R2015b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!