Neural net training is crashing my win10 computer completely

23 views (last 30 days)
I'm very new to neural nets, but I've been trying to follow this tutorial (and others) to load a pretrained network (VGG19) and edit the output layers to use it for regression rather than classification. The training and validation data are labeled, 224x224 images which are loaded as 4D arrays, trn_4d and val_4d, as specified in the tutorial. The variables trn_L and val_L are the normalized label vectors.
When I run this code, my computer spontaneously reboots itself immediately after the neural net training window opens. I can't find any error log for Matlab, and my PC system logger doesn't show anything crazy happening. This is making it very difficult for me to track down the problem. I'm hoping my mistake is obvious to someone else.
% Load pretrained convnet
net = vgg19;
layers = net.Layers;
% Delete the output layers
layers = layers(1:44);
% Add in new output layers for regression
layers = [layers
fullyConnectedLayer(1,"Name","fc8","WeightL2Factor",0)
regressionLayer("Name","regressionoutput")];
% Convnet training settings
miniBatchSize = 4;
validationFrequency = floor(numel(trn_WL)/miniBatchSize);
options = trainingOptions('sgdm', ...
'MiniBatchSize',miniBatchSize, ...
'MaxEpochs',30, ...
'InitialLearnRate',1e-5, ...
'LearnRateSchedule','piecewise', ...
'LearnRateDropFactor',0.1, ...
'LearnRateDropPeriod',20, ...
'Shuffle','every-epoch', ...
'ValidationData',{val_4d, val_L}, ...
'ValidationFrequency',validationFrequency, ...
'Plots','training-progress', ...
'Verbose',false);
% Convnet object
net = trainNetwork(trn_4d,trn_L,layers,options);
For reference, here are the original 47 layers for VGG-19:
layers = [
imageInputLayer([224 224 3],"Name","input")
convolution2dLayer([3 3],64,"Name","conv1_1","Padding",[1 1 1 1],"WeightL2Factor",0)
reluLayer("Name","relu1_1")
convolution2dLayer([3 3],64,"Name","conv1_2","Padding",[1 1 1 1],"WeightL2Factor",0)
reluLayer("Name","relu1_2")
maxPooling2dLayer([2 2],"Name","pool1","Stride",[2 2])
convolution2dLayer([3 3],128,"Name","conv2_1","Padding",[1 1 1 1],"WeightL2Factor",0)
reluLayer("Name","relu2_1")
convolution2dLayer([3 3],128,"Name","conv2_2","Padding",[1 1 1 1],"WeightL2Factor",0)
reluLayer("Name","relu2_2")
maxPooling2dLayer([2 2],"Name","pool2","Stride",[2 2])
convolution2dLayer([3 3],256,"Name","conv3_1","Padding",[1 1 1 1],"WeightL2Factor",0)
reluLayer("Name","relu3_1")
convolution2dLayer([3 3],256,"Name","conv3_2","Padding",[1 1 1 1],"WeightL2Factor",0)
reluLayer("Name","relu3_2")
convolution2dLayer([3 3],256,"Name","conv3_3","Padding",[1 1 1 1],"WeightL2Factor",0)
reluLayer("Name","relu3_3")
convolution2dLayer([3 3],256,"Name","conv3_4","Padding",[1 1 1 1],"WeightL2Factor",0)
reluLayer("Name","relu3_4")
maxPooling2dLayer([2 2],"Name","pool3","Stride",[2 2])
convolution2dLayer([3 3],512,"Name","conv4_1","Padding",[1 1 1 1],"WeightL2Factor",0)
reluLayer("Name","relu4_1")
convolution2dLayer([3 3],512,"Name","conv4_2","Padding",[1 1 1 1],"WeightL2Factor",0)
reluLayer("Name","relu4_2")
convolution2dLayer([3 3],512,"Name","conv4_3","Padding",[1 1 1 1],"WeightL2Factor",0)
reluLayer("Name","relu4_3")
convolution2dLayer([3 3],512,"Name","conv4_4","Padding",[1 1 1 1],"WeightL2Factor",0)
reluLayer("Name","relu4_4")
maxPooling2dLayer([2 2],"Name","pool4","Stride",[2 2])
convolution2dLayer([3 3],512,"Name","conv5_1","Padding",[1 1 1 1],"WeightL2Factor",0)
reluLayer("Name","relu5_1")
convolution2dLayer([3 3],512,"Name","conv5_2","Padding",[1 1 1 1],"WeightL2Factor",0)
reluLayer("Name","relu5_2")
convolution2dLayer([3 3],512,"Name","conv5_3","Padding",[1 1 1 1],"WeightL2Factor",0)
reluLayer("Name","relu5_3")
convolution2dLayer([3 3],512,"Name","conv5_4","Padding",[1 1 1 1],"WeightL2Factor",0)
reluLayer("Name","relu5_4")
maxPooling2dLayer([2 2],"Name","pool5","Stride",[2 2])
fullyConnectedLayer(4096,"Name","fc6","WeightL2Factor",0)
reluLayer("Name","relu6")
dropoutLayer(0.5,"Name","drop6")
fullyConnectedLayer(4096,"Name","fc7","WeightL2Factor",0)
reluLayer("Name","relu7")
dropoutLayer(0.5,"Name","drop7")
fullyConnectedLayer(1000,"Name","fc8","WeightL2Factor",0)
softmaxLayer("Name","prob")
classificationLayer("Name","output")];
  7 Comments
cp-424
cp-424 on 14 Oct 2020
Edited: cp-424 on 14 Oct 2020
When I ran vgg19, the crash occurred at the very start of training. When I get crashes with alexnet, the crash happens after several minutes of training have elapsed. I get the same results with executionenvironment specified.
Given that my computer is going through a hard reset, and Matlab is not generating any error logs, I'm guessing that this is a hardware issue when the GPU is being pushed to full capacity. Thanks for the help Uday.
Uday Pradhan
Uday Pradhan on 15 Oct 2020
You may still keep learning about deep learning using shallower networks (~10 - 12 layers). Feel free to visit the documentation pages for more information.

Sign in to comment.

Answers (2)

HayderMU
HayderMU on 15 Jan 2022
Hi,
I am having the same problem as you. The computer suddenly reboots during training. I could not find any log file in Matlab or Windows. I am using Matlab 2020b in windows 10 (RAM 32GB and GPU have 8GB memory). I installed a GPU and CPU temp logger which did not record anything unusual. I think it's something related to windows 10. I have another pc with lower specifications (Matlab 2020b, 4GB GPU, and WIn7). I tried the training process and it went without any errors. The problem occurred with different networks (not just a single network). I still have no reasonable explanation for it. I have to train the networks with errors in my old PC.

Stephen Wolstenholme
Stephen Wolstenholme on 4 Mar 2024
Multiple copies of my main application EasyNN-plus have been running without any problems for decades on Windows 7 onwards. I expected it to work on Windows 11 but it crashed. After a lot of work it became obvious that it was a fault with Windows 11. I installed the latest release called 23H2 and the fault dissapeared. I think it could have been a threading problem because EasNN-plus uses many threads. It winds down to 1 thread as the target error is in range. On 23H2 there is no problem.

Categories

Find more on Image Data Workflows in Help Center and File Exchange

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!