Finding best neural network structure using optimization algorithms and cross-validation

Question

Jack on 31 Aug 2014

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/152947-finding-best-neural-network-structure-using-optimization-algorithms-and-cross-validation

Answered: orlem lima dos santos on 26 Jan 2018

Hi.

I'm using optimization algorithm to find best structure+inputs of a 'patternnet' neural network in MATLAB R2014a using 5-fold cross validation. Where should i initialize weights of my neural network?

 *Position_1(for weight initialization)*
 for i=1:num_of_loops
 *Position_2(for weight initialization)* 
 - repeating cross validation
 for i=1:num_of_kfolds
 *Position_3(for weight initialization)*
 - Cross validation loop
 end
  end

I'm repeating 5-fold cross validation (because random selection of cross validation) to have more reliable outputs (average of neural network outputs). Which part is better for weight initialization (Position_1,Position_2 or Position_3) and why?

thanks.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Greg Heath on 31 Aug 2014

3
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/152947-finding-best-neural-network-structure-using-optimization-algorithms-and-cross-validation#answer_150321

Open in MATLAB Online

To help understanding, I will assume Nval = Ntst = 0. Search for the nonzero examples in the NEWSGROUP and ANSWERS.

To design a typical I-H-O net with Ntrn training examples, try to not let the number of unknown weights

Nw = (I+1)*H+(H+1)*O

exceed the number of training equations

Ntrneq = Ntrn*O

This will occur as long as H <= Hub where Hub is the upperbound

Hub = -1+ceil( (Ntrneq-O) / (I+O+1) )

Based on Ntrneq and Hub I decide on a set of numH candidate values for H

 0 <= Hmin:dH:Hmax <= Hmax
 numH = numel(Hmin:dH:Hmax)

and the number of weight initializations for each value of H, e.g.,

Ntrials = 10

If the training target is ttrn = t(indtrn), the mean-square-error of a naïve constant output net (independent of the input) is

MSEtrn00 = mean(var(ttrn',1));

Then using MSEtrn00 as a normalization reference, I use the following double loop format

 rng(0)
 j=0
 for h = Hmin:dH:Hmax
     j=j+1
     net = ...
     net.divideFcn = 'dividetrain';
          ... 
     for i = 1:Ntrials
         net = configure(net,x,t);
         ...
         [ net tr y e ] = train(net,x,t);
          ...
         R2(i,j) = 1-mse(e)/MSEtrn00;
      end
 end

For details, search for one of my designs using the search word greg and a subset of some the variables I always use ( e.g., Nw, Ntrneq, Hub, Ntrials ...)

Hope this helps

Thank you for formally accepting my answer

Greg

1 Comment
Show -1 older commentsHide -1 older comments

Jack on 1 Sep 2014

Edited: Jack on 1 Sep 2014

Open in MATLAB Online

Thank you for answer Greg.I think your answer is a little complicated so i have some questions.

- As you said i think i should use `position 3` and have weight initialization for every calling of neural network. Is that true? With this structure can we compare outputs for every calling of optimization algorithm for these codes?

- Now I'm getting cross validation accuracy, sensitivity, specificity and ROC output and averaging between them using (sum(accuracy)/num_of_loops) but you used :

MSEtrn00 = mean(var(ttrn',1));
R2(i,j) = 1-mse(e)/MSEtrn00;

Can you describe more about it? I didn't get that how can i do this in my structure (Cross validation+loops)

Thanks.

Ps.

My optimization algorithm is searching for best neural network structure and best inputs and I'm only use this structure to find more reliable average accuracy for cost function of optimization algorithm and finally find the global minimum (cost) of neural network (best structure+best features). + so in my structure optimization algorithm is searching the space using output cost of above structure. The only problem is position of weight initialization because of example if we have it in position one we only one time generate weights for calling of this cost function (for specific inputs,number of layers (maximum 2) and number of neurons in every layer. So we want focus on best network structure. + after finding the best structure (for example with 5folds and 5 loops), i will put together these models (25 models), insert out-sample data to every of these neural networks and average the outputs (real using of system). + as you know optimization algorithm is comparing between cost outputs in every iteration so we should select best position for weight initialization because the effect of that on cost, Somebody said to me that we can't compare between costs if we select position 3, we should select position 1.

Sign in to comment.

Answer 2

Greg Heath on 2 Sep 2014

3
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/152947-finding-best-neural-network-structure-using-optimization-algorithms-and-cross-validation#answer_150417

The outline I gave was NOT the standard k-fold cross-validation where, for each H candidate, the data is divided into k indivisible subsets where 1 subset is used for validation, 1 for testing and k-2 for training. The final estimate is the average of the k test set performances.

The technique I use randomly divides the data and randomly choses the initial weights for EACH of numH*Ntrials (~100) designs. The results can be displayed in 3(trn/val/tst) Ntrials-by-numH Rsquared matrices. For robustness, I choose the smallest value of H column that has m or more Rsquared validation values above a specified threshold (typically 0.99). The value of m depends on the problem. I then obtain summary statistics of remaining Ntrials-m R^2 values.

Although I do have a few strict k-fold design postings, I don't think the results are as representative as my approach.

I will search for the few k-fold designs I have posted.

P.S. Look up (e.g., Wikipedia) coefficient-of-variation and/or Rsquared, the fraction of target variance that is accounted for by the model.

Greg

5 Comments
Show 3 older commentsHide 3 older comments

Jack on 2 Sep 2014

Edited: Jack on 2 Sep 2014

Thank you for answer Greg. Now suppose that the sample size is low so we can't increase folds in your cross validation and we want repeat that loop to find more reliable results. In this situation we should initialize weights in same position you mentioned? ( in the inner loop of k-fold -position 3) or we should set it in other position? + Can we use 'classperf' in your proposed code for calculating accuracy,sensitivity and specificity?

+ Can i use this structure without 'MSEtrn00' or other calculation like `net.trainParam.goal = MSEgoal` and `net.trainParam.min_grad = MinGrad` as you mentioned in previous posts? Can I have default values of neural network structure for these parameters? Why you changed them?

- in neural network area of study usually validation size is lower than test size. in your structure these sizes are equal. Is this true?

- what is purpose of using 'rng(0)' function ? and what is that zero?

- I don't know why but when i change position of weight-initialization from one to three, The improvement (based on accuracy) is lower than position one (in optimization algorithm). I will average between accuracies. Thanks.

Greg Heath on 2 Sep 2014

Open in MATLAB Online

% Thank you for answer Greg. Now suppose that the sample size is low so we % can't increase folds in your cross validation and we want repeat that % loop to find more reliable results. In this situation we should % initialize weights in same position you mentioned? ( in the inner loop of % k-fold -position 3) or we should set it in other position?

If Ntrneq >> Nw does not hold for the smallest acceptable value of H, use a validation set and/or regularization (e.g., msereg) to prevent overtraining an overfit net. Or use trainbr (UNFORTUNATELY, MATLAB does nor allow a val set with trainbr)

%+ Can we use 'classperf' in your proposed code for calculating accuracy, % sensitivity and specificity?

Not familiar with classperf: What Toolbox?

 >> help classperf
   classperf not found.

% + Can i use this structure without 'MSEtrn00'

If you wish. However NMSE = MSE/MSE00 and R2 = 1-NMSE make the results scale invariant and R2 interpretable as the fraction of target variance that is modeled by the net. See wikipedia r squared

% `net.trainParam.goal = MSEgoal` and `net.trainParam.min_grad = MinGrad` % as you mentioned in previous posts? Can I have default values of neural % network structure for these parameters? Why you changed them?

I don't need/want R2trna > 0.99 or MinGrad < MSE/100. This shortens training time and also helps prevent overtraining an overfit net.

% - in neural network area of study usually validation size is lower than % test size. in your structure these sizes are equal. Is this true?

MATLAB and I prefer them equal. The val set helps prevent overtraining an overfit net.

% - what is purpose of using 'rng(0)' function ? and what is that zero?

What does your computer say:

 help rng
 doc rng
 type rng

% - I don't know why but when i change position of weight-initialization % from one to three, The improvement (based on accuracy) is lower than % position one (in optimization algorithm). I will average between % accuracies. Thanks.

In position 1 the different designs are not independent. This especially hurts if the nets are overtrained

In position 3 with configure, both data division and weight initialization are random. Therefore, more of parameter space is searched for good combinations and error bars are more reliable.

Hope this helps

Greg

Jack on 4 Sep 2014

Edited: Jack on 5 Sep 2014

Open in MATLAB Online

Thank you so much Greg for your helps.

'classperf' is a function from bioinformatics toolbox.

First question : I read `rng` i don't get the purpose of using 'rng' before cross-validation loops. you said :

 In position 3 with configure, both data division and weight initialization   are random. Therefore, more of parameter space is searched for good combinations and error bars are more reliable.

using 'rng' is equal to have net initialization in position 1! We have same weight initialization with rng(0) in cross-validation loop. Is that true? So as you see in my structure I should use rng(0) in position 1,position 2 or position 3? and where should i configure the net? As you see in my structure the optimization algorithm is searching for best combination of inputs,number of neurons and layers and these codes are cost function of this optimization algorithm. As you said we should initialize new weights in every cross-validation loop for better searching of space (besides number of neurons and layers,combination of inputs is changing in every calling of cost function by optimization algortiyhm). Is that true?

+ With `rng` we have same `randperm` indexes (train,test,val) in all loops. When i call cost function of optimization algorithm, MATLAB call 'rng' again and change the indexes. So I have different indexes and weights in every calling (but equal in the loops). Is that true (in every call i have different inputs,number of layers and neurons)? and why should i have same indexes in loops? Should i have same indexes ans initial weighs in all cost calling of optimization algorithm? (in every calling we are evaluating neural network with different inputs,number of neurons and layers). We are not limiting search space to find best model structure by optimization algorithm? (all assumptions are based on your proposed cross-validation indexes not `crossvalind`) - I think if we use same indexes (or weights) in all calling of cost function, We can't search all space to find best structure with optimization algorithm. in your structure you are only changing number of neurons (and layers) but in my structure besides those, combination of inputs are changing. What do you think? when I set same 'rng' for all iterations of optimization algorithm the improvement is so low. The classification accuracy increase in first iteration and stop in a accuracy value.

Second question : Totally do do you think R-square is better than binary classification accuracy as cost function of my optimization algorithm?

Third question : Finally after find best model structure. I will use all best selected neural networks and average between output of them for out-sample data. Is this a good approach? I'm using outer loop ( i=1:num_of_loops ) to have more reliable outputs. after finding best model I will use all neural networks of best model structure (num_of_loops*num_of_kfolds trained models), insert out-sample data to them and average between outputs. (before averaging i will remove outlier outputs with something likes X-mean(X)>=2*std(X) ). so my system is searching for best pack of trained models not only best structures of neural network. What do you think and what do you recommend?

Fourth question : You used something like this in your codes :

 M = floor(N/k) % length(valind & tstind)
 Ntrn = N-2*M % length(trnind)
 Ntrneq = Ntrn*O % No. training equations
 H = 10 % default No. hidden nodes

Nw = (I+1)*H+(H+1)*O % No. unknown weights Ndof = Ntrneq-Nw % No. of estimation degrees of freedom MSEgoal = 0.01*Ndof*MSE00a/Ntrneq; MinGrad = MSEgoal/10;

Can you revise it for two layers neural network? `MinGrad` is equal to `MSEgoal/10` . 10 is number of neurons? We should only use `net. numWeightElements` for `Nw` ?

Fifth question : Where should i use `randperm` for cross-validation? in which loop of my structure? I should be after `rng` but where?

Ps. When i see neural networks training process, in all trainings, early stopping (maximum=6 /default value of MATLAB) is stopping the training process after 15~40 iterations.

Totally i think we have these options:

- put 'rng' in outer loop 
- put 'rng' before outer loop

( We have different random generation in every calling of cost functions in both above options but same in loops )

- put 'rng' in outer loop with same generation in every calling of cost function.
- put 'rng' in befor outer loop with same generation in every calling of cost function.

( We have same random generation in every calling of cost functions in both above options )

- Remove 'rng' from code and initialize weights in (position 3)
- Remove 'rng' from code and initialize  weights in (position 2)
- Remove 'rng' from code and initialize  weights in (position 1)

(+ different position (1,2,3) for `randperm` of index (train/test/validation) ).

Sorry for these long questions. You are expert in this area. I'm so thankful for your helps and your comments are improving my system dear Greg.

Greg Heath on 6 Sep 2014

Open in MATLAB Online

 % 'classperf' is a function from bioinformatics toolbox.
 % First question : I read `rng` i don't get the purpose of using 'rng'
 % before cross-validation loops.

To prevent biasing designs and error estimation

 1. The data should be randomized before it is divided into the k-folds.
 2. The initial weights for each design should be random. 
 3. In order to duplicate the design process, the RNG should be initialized once, and only once (therefore, not in a loop) before data division and weight  initialization.
 4.There are different RNGs you can use. See help rng and doc rng for
 details. I have been using the single initialization command rng(0) for
 eons. However, the zero can be replaced by any positive integer less
 than 2^m (I forgot what m is). Sometimes I use 1492 (Christopher
 Columbus) or 4151941 (a birthday). I think the latest MATLAB mprovements  recommend rng('default').
 % where should i configure the net?
 5.When designing in nested loops, the nets should be configured just 
 before each call of train. Otherwise train will begin with the final weights of the previous design.

% As you see in my structure the optimization algorithm is searching for best combination of inputs, number of neurons and layers and these codes are cost function of this optimization algorithm. As you said we should initialize new weights in every cross-validation loop for better searching of space (besides number of neurons and layers, combination of inputs is changing in every calling of cost function by optimization algortiyhm). Is that true?

Not sure what you mean.

6. I have found no reason to use more than 1 hidden layer

7. I usually perform input variable selection only after a best design is chosen. However, in extreme cases where there are huge numbers of input variables, I tend to use linear models (e.g., Regress for regression and PLS for classification) to reduce the number of variables to more manageable numbers.

8. I choose timeseries delays using auto and cross correlation functions before concentrating on the multiple loop search for number of hidden nodes and weights.

9. rng is used once and only once before the outer loop.

10. The 1st loop changes trn/val/tst subsets k times

11. The 2nd loop changes the number of hidden nodes numH times

12. The 3rd loop changes the trained weights Ntrials times

% When i call cost function of optimization algorithm, MATLAB call 'rng' % again and change the indexes. So I have different indexes and weights in % every calling (but equal in the loops). Is that true (in every call i % have different inputs,number of layers and neurons)? and why should i % have same indexes in loops? Should i have same indexes ans initial weighs % in all cost calling of optimization algorithm? (in every calling we are % evaluating neural network with different inputs,number of neurons and % layers). We are not limiting search space to find best model structure by % optimization algorithm? (all assumptions are based on your proposed % cross-validation indexes not `crossvalind`) - I think if we use same % indexes (or weights) in all calling of cost function, We can't search all % space to find best structure with optimization algorithm. in your % structure you are only changing number of neurons (and layers) but in my % structure besides those, combination of inputs are changing. What do you % think? when I set same 'rng' for all iterations of optimization algorithm % the improvement is so low. The classification accuracy increase in first % iteration and stop in a accuracy value.

13. I think adding the simultaneous selection of input variable subsets and number of hidden layers adds more complexity than its worth.

% Second question : Totally do do you think R-square is better than binary % classification accuracy as cost function of my optimization algorithm?

14. Since this is classification, the cost function should be crossentropy, the patternnet default.

% Third question : Finally after find best model structure. I will use all % best selected neural networks and average between output of them for % out-sample data. Is this a good approach? I'm using outer loop ( % i=1:num_of_loops ) to have more reliable outputs. after finding best % model I will use all neural networks of best model structure % (num_of_loops*num_of_kfolds trained models), insert out-sample data to % them and average between outputs. (before averaging i will remove outlier % outputs with something likes X-mean(X)>=2*std(X) ). so my system is % searching for best pack of trained models not only best structures of % neural network. What do you think and what do you recommend?

15. Use validation subset performance to select the best nets. 16. Use test subset performance to obtain UNBIASED predictions of performance on unseen data

% Fourth question : You used something like this in your codes :

% M = floor(N/k) % length(valind & tstind) Ntrn = N-2*M % length(trnind) % Ntrneq = Ntrn*O % No. training equations H = 10 % default No. hidden % nodesNw = (I+1)*H+(H+1)*O % No. unknown weights Ndof = Ntrneq-Nw % No. % of estimation degrees of freedom MSEgoal = 0.01*Ndof*MSE00a/Ntrneq; % MinGrad = MSEgoal/10; %10 is number of neurons?

No. Make that MSEgoal/20 for minimizing MSE.

WHOOPS! The performance measure for patternnet is crossentropy, not MSE.! Not sure what to do about that yet!

%Can you revise it for two layers neural network? `

Nw = (I+1)*H1+(H1+1)*H2+(H2+1)*O

 %We should only use `net.numWeightElements` for `Nw` ?
 You can use either. They should be the same.

% Fifth question : Where should i use `randperm` for cross-validation? in % which loop of my structure? I should be after `rng` but where?

Immediately after is ok.

% Ps. When i see neural networks training process, in all trainings, early % stopping (maximum=6 /default value of MATLAB) is stopping the training % process after 15~40 iterations.

That is what it is supposed to do. The validation performance is getting worse indicating that the net is probably losing it's ability to perform well on unseen data..

Totally i think we have these options:

% - put 'rng' in outer loop - put 'rng' before outer loop ( We have % different random generation in every calling of cost functions in both % above options but same in loops ) % % - put 'rng' in outer loop with same generation in every calling of cost % function. - put 'rng' in befor outer loop with same generation in every % calling of cost function. ( We have same random generation in every % calling of cost functions in both above options ) % % - Remove 'rng' from code and initialize weights in (position 3) - Remove % 'rng' from code and initialize weights in (position 2) - Remove 'rng' % from code and initialize weights in (position 1) (+ different position % (1,2,3) for `randperm` of index (train/test/validation) ).

I didn't even read what you wrote above because the answer is very simple:

Initialize the RNG once and only once before any loops and the first use of any random function.

% Sorry for these long questions.

You shouldn't be. If it were not for you, I would not have realized that specifying MSEgoal should not do anything if the performance function is Xent.

Greg

Jack on 7 Sep 2014

Thank you so much Greg for your helps.

Sign in to comment.

Answer 3

orlem lima dos santos on 26 Jan 2018

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/152947-finding-best-neural-network-structure-using-optimization-algorithms-and-cross-validation#answer_301924

There is a algorithm known as grid search that can find the solution for what you want

you can find one implementation made by caghangir in the link below

https://www.mathworks.com/matlabcentral/fileexchange/63132-grid-search-function-for-neural-networks

This algorithm perfoms a 10-fold cross-validation.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Finding best neural network structure using optimization algorithms and cross-validation

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (2)

5 Comments
Show 3 older commentsHide 3 older comments

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

Finding best neural network structure using optimization algorithms and cross-validation

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (2)

5 Comments Show 3 older commentsHide 3 older comments

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments

5 Comments
Show 3 older commentsHide 3 older comments

0 Comments
Show -2 older commentsHide -2 older comments