Recognize overfitting in retraining

3 views (last 30 days)
I wrote the following code, inspired of those proposed in the neural network toolbox manual, to retrain a network
load dati_MRTA.mat% where IN_MRTA=13x49 double and TARGET_MRTA=1x49 double
Q=size(IN_MRTA,2);
Q1=floor(Q*0.9);
Q2=Q-Q1;
ind=randperm(Q);
ind1=ind(1:Q1);
ind2=ind(Q1+(1:Q2));
x1=IN_MRTA(:,ind1);
t1=TARGET_MRTA(:,ind1);
x2=IN_MRTA(:,ind2);
t2=TARGET_MRTA(:,ind2);
net=feedforwardnet(13,'trainlm');
numNN=10;
NN=cell(1,numNN);
tr=cell(1,numNN);
perfs=zeros(3,numNN);
for i=1:numNN
disp(['Training ' num2str(i) '/' num2str(numNN)])
[NN{i},tr{i}]=train(net,x1,t1);
y2=NN{i}(x2);
perfs(1,i)=sqrt(tr{i}.best_perf);
perfs(2,i)=sqrt(tr{i}.best_vperf);
perfs(3,i)=sqrt(mse(net,t2,y2));
end
best results I've obtained during the same iteration are RMSEtraining=4.8730 RMSEvalidation=7.8195 RMSEtest=10.3158, the corresponding performanec plot is the following:
it does reprents a good result or it is and indication of possible overfitting?
  1 Comment
Greg Heath
Greg Heath on 26 Sep 2015
Either post your data or choose an example from MATLAB's NN examples.
help nndatasets
and
doc nndatasets
Hope this helps.
Greg

Sign in to comment.

Accepted Answer

Greg Heath
Greg Heath on 27 Sep 2015
Recognize overfitting in retraining Asked by Federico Ambrogio on 25 Sep 2015 at 9:47 I wrote the following code, inspired of those proposed in the neural network toolbox manual, to retrain a network load dati_MRTA.mat% where IN_MRTA=13x49 double and TARGET_MRTA=1x49 double
1. OVERFITTING and OVERTRAINING
When
[ I N ] = size(input) % [ 13 49 ]
[ O N ] = size(target) % [ 1 49 ]
and
H is the number of hidden nodes % 13
Nw = (I+1)*H+(H+1)*O = 14*13+14*1 = 196
is the number of unknown weights that have to be estimated. OVERFITTING occurs when there are more unknown weights than there are training equations
Nw =196 > 49 = N*O >= Ntrn*O = Ntrneq
Problems occur when you OVERTRAIN an OVERFIT net. Three methods of preventing overtraining are
a. Do not overfit H <= Hub (upperbound) where,
using the MATLAB default Ntrn ~ 0.7*N
Hub = floor(( Ntrn*O-O)/(I+O+1))
= floor(( 0.7*N*O-O)/(I+O+1) ) = 2
b. Validation Stopping
c. Regularization using MSEREG or TRAINBR.
2. Train automatically divides the data into three
subsets trn/val/tst with ratios 0.7/0.15/0.15.
Therefore, either accept this or EXPLICITLY override it with net.divideParam.trainRatio, etc. to get the three unit-sum ratios. There is no need for you to explicitly divide the data !
3. When training in a loop, the net must be reconfigured with function CONFIGURE at the beginning of each pass through the loop. Otherwise you will just continue to train the same net with initial weights equal to the final weights obtained from the previous pass.
4. Initialize the RNG before using the 1st random number so you can reproduce your results that depend on both random data division and random weight intialization.
5. Use the regression function fitnet instead of the general function feedforwardnet.
6. You can obtain zillions of examples by searching both the NEWSGROUP and ANSWERS using the search words
greg fitnet tutorial
greg fitnet Hub
greg fitnet Ntrials
Hope this helps.
Thank you for formally accepting my answer
Greg
  1 Comment
Greg Heath
Greg Heath on 27 Sep 2015
For regession, the best measures of performance are the normalize mse
NMSE = mse(error)/ mean(var(target',1))
and the coefficient of regression (AKA) the Rsquared
Rsq = 1- NMSE
which is interpreted as the fraction of target variance that is modeled by the net.
For regression I set my training goal so that NMSE <= 0.01 i.e., Rsq > 0.99
Hope this helps.
Greg

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!