# Finding best neural network structure using optimization algorithms and cross-validation

6 views (last 30 days)
Jack on 31 Aug 2014
Hi.
I'm using optimization algorithm to find best structure+inputs of a 'patternnet' neural network in MATLAB R2014a using 5-fold cross validation. Where should i initialize weights of my neural network?
*Position_1(for weight initialization)*
for i=1:num_of_loops
*Position_2(for weight initialization)*
- repeating cross validation
for i=1:num_of_kfolds
*Position_3(for weight initialization)*
- Cross validation loop
end
end
I'm repeating 5-fold cross validation (because random selection of cross validation) to have more reliable outputs (average of neural network outputs). Which part is better for weight initialization (Position_1,Position_2 or Position_3) and why?
thanks.

Greg Heath on 31 Aug 2014
To help understanding, I will assume Nval = Ntst = 0. Search for the nonzero examples in the NEWSGROUP and ANSWERS.
To design a typical I-H-O net with Ntrn training examples, try to not let the number of unknown weights
Nw = (I+1)*H+(H+1)*O
exceed the number of training equations
Ntrneq = Ntrn*O
This will occur as long as H <= Hub where Hub is the upperbound
Hub = -1+ceil( (Ntrneq-O) / (I+O+1) )
Based on Ntrneq and Hub I decide on a set of numH candidate values for H
0 <= Hmin:dH:Hmax <= Hmax
numH = numel(Hmin:dH:Hmax)
and the number of weight initializations for each value of H, e.g.,
Ntrials = 10
If the training target is ttrn = t(indtrn), the mean-square-error of a naïve constant output net (independent of the input) is
MSEtrn00 = mean(var(ttrn',1));
Then using MSEtrn00 as a normalization reference, I use the following double loop format
rng(0)
j=0
for h = Hmin:dH:Hmax
j=j+1
net = ...
net.divideFcn = 'dividetrain';
...
for i = 1:Ntrials
net = configure(net,x,t);
...
[ net tr y e ] = train(net,x,t);
...
R2(i,j) = 1-mse(e)/MSEtrn00;
end
end
For details, search for one of my designs using the search word greg and a subset of some the variables I always use ( e.g., Nw, Ntrneq, Hub, Ntrials ...)
Hope this helps
Thank you for formally accepting my answer
Greg
Jack on 1 Sep 2014
Edited: Jack on 1 Sep 2014
Thank you for answer Greg.I think your answer is a little complicated so i have some questions.
- As you said i think i should use `position 3` and have weight initialization for every calling of neural network. Is that true? With this structure can we compare outputs for every calling of optimization algorithm for these codes?
- Now I'm getting cross validation accuracy, sensitivity, specificity and ROC output and averaging between them using (sum(accuracy)/num_of_loops) but you used :
MSEtrn00 = mean(var(ttrn',1));
R2(i,j) = 1-mse(e)/MSEtrn00;
Can you describe more about it? I didn't get that how can i do this in my structure (Cross validation+loops)
Thanks.
Ps.
My optimization algorithm is searching for best neural network structure and best inputs and I'm only use this structure to find more reliable average accuracy for cost function of optimization algorithm and finally find the global minimum (cost) of neural network (best structure+best features). + so in my structure optimization algorithm is searching the space using output cost of above structure. The only problem is position of weight initialization because of example if we have it in position one we only one time generate weights for calling of this cost function (for specific inputs,number of layers (maximum 2) and number of neurons in every layer. So we want focus on best network structure. + after finding the best structure (for example with 5folds and 5 loops), i will put together these models (25 models), insert out-sample data to every of these neural networks and average the outputs (real using of system). + as you know optimization algorithm is comparing between cost outputs in every iteration so we should select best position for weight initialization because the effect of that on cost, Somebody said to me that we can't compare between costs if we select position 3, we should select position 1.

Greg Heath on 2 Sep 2014
The outline I gave was NOT the standard k-fold cross-validation where, for each H candidate, the data is divided into k indivisible subsets where 1 subset is used for validation, 1 for testing and k-2 for training. The final estimate is the average of the k test set performances.
The technique I use randomly divides the data and randomly choses the initial weights for EACH of numH*Ntrials (~100) designs. The results can be displayed in 3(trn/val/tst) Ntrials-by-numH Rsquared matrices. For robustness, I choose the smallest value of H column that has m or more Rsquared validation values above a specified threshold (typically 0.99). The value of m depends on the problem. I then obtain summary statistics of remaining Ntrials-m R^2 values.
Although I do have a few strict k-fold design postings, I don't think the results are as representative as my approach.
I will search for the few k-fold designs I have posted.
P.S. Look up (e.g., Wikipedia) coefficient-of-variation and/or Rsquared, the fraction of target variance that is accounted for by the model.
Greg
Greg Heath on 6 Sep 2014
% 'classperf' is a function from bioinformatics toolbox.
% First question : I read `rng` i don't get the purpose of using 'rng'
% before cross-validation loops.
To prevent biasing designs and error estimation
1. The data should be randomized before it is divided into the k-folds.
2. The initial weights for each design should be random.
3. In order to duplicate the design process, the RNG should be initialized once, and only once (therefore, not in a loop) before data division and weight initialization.
4.There are different RNGs you can use. See help rng and doc rng for
details. I have been using the single initialization command rng(0) for
eons. However, the zero can be replaced by any positive integer less
than 2^m (I forgot what m is). Sometimes I use 1492 (Christopher
Columbus) or 4151941 (a birthday). I think the latest MATLAB mprovements recommend rng('default').
% where should i configure the net?
5.When designing in nested loops, the nets should be configured just
before each call of train. Otherwise train will begin with the final weights of the previous design.
% As you see in my structure the optimization algorithm is searching for best combination of inputs, number of neurons and layers and these codes are cost function of this optimization algorithm. As you said we should initialize new weights in every cross-validation loop for better searching of space (besides number of neurons and layers, combination of inputs is changing in every calling of cost function by optimization algortiyhm). Is that true?
Not sure what you mean.
6. I have found no reason to use more than 1 hidden layer
7. I usually perform input variable selection only after a best design is chosen. However, in extreme cases where there are huge numbers of input variables, I tend to use linear models (e.g., Regress for regression and PLS for classification) to reduce the number of variables to more manageable numbers.
8. I choose timeseries delays using auto and cross correlation functions before concentrating on the multiple loop search for number of hidden nodes and weights.
9. rng is used once and only once before the outer loop.
10. The 1st loop changes trn/val/tst subsets k times
11. The 2nd loop changes the number of hidden nodes numH times
12. The 3rd loop changes the trained weights Ntrials times
% When i call cost function of optimization algorithm, MATLAB call 'rng' % again and change the indexes. So I have different indexes and weights in % every calling (but equal in the loops). Is that true (in every call i % have different inputs,number of layers and neurons)? and why should i % have same indexes in loops? Should i have same indexes ans initial weighs % in all cost calling of optimization algorithm? (in every calling we are % evaluating neural network with different inputs,number of neurons and % layers). We are not limiting search space to find best model structure by % optimization algorithm? (all assumptions are based on your proposed % cross-validation indexes not `crossvalind`) - I think if we use same % indexes (or weights) in all calling of cost function, We can't search all % space to find best structure with optimization algorithm. in your % structure you are only changing number of neurons (and layers) but in my % structure besides those, combination of inputs are changing. What do you % think? when I set same 'rng' for all iterations of optimization algorithm % the improvement is so low. The classification accuracy increase in first % iteration and stop in a accuracy value.
13. I think adding the simultaneous selection of input variable subsets and number of hidden layers adds more complexity than its worth.
% Second question : Totally do do you think R-square is better than binary % classification accuracy as cost function of my optimization algorithm?
14. Since this is classification, the cost function should be crossentropy, the patternnet default.
% Third question : Finally after find best model structure. I will use all % best selected neural networks and average between output of them for % out-sample data. Is this a good approach? I'm using outer loop ( % i=1:num_of_loops ) to have more reliable outputs. after finding best % model I will use all neural networks of best model structure % (num_of_loops*num_of_kfolds trained models), insert out-sample data to % them and average between outputs. (before averaging i will remove outlier % outputs with something likes X-mean(X)>=2*std(X) ). so my system is % searching for best pack of trained models not only best structures of % neural network. What do you think and what do you recommend?
15. Use validation subset performance to select the best nets. 16. Use test subset performance to obtain UNBIASED predictions of performance on unseen data
% Fourth question : You used something like this in your codes :
% M = floor(N/k) % length(valind & tstind) Ntrn = N-2*M % length(trnind) % Ntrneq = Ntrn*O % No. training equations H = 10 % default No. hidden % nodesNw = (I+1)*H+(H+1)*O % No. unknown weights Ndof = Ntrneq-Nw % No. % of estimation degrees of freedom MSEgoal = 0.01*Ndof*MSE00a/Ntrneq; % MinGrad = MSEgoal/10; %10 is number of neurons?
No. Make that MSEgoal/20 for minimizing MSE.
WHOOPS! The performance measure for patternnet is crossentropy, not MSE.! Not sure what to do about that yet!
%Can you revise it for two layers neural network? `
Nw = (I+1)*H1+(H1+1)*H2+(H2+1)*O
%We should only use `net.numWeightElements` for `Nw` ?
You can use either. They should be the same.
% Fifth question : Where should i use `randperm` for cross-validation? in % which loop of my structure? I should be after `rng` but where?
Immediately after is ok.
% Ps. When i see neural networks training process, in all trainings, early % stopping (maximum=6 /default value of MATLAB) is stopping the training % process after 15~40 iterations.
That is what it is supposed to do. The validation performance is getting worse indicating that the net is probably losing it's ability to perform well on unseen data..
Totally i think we have these options:
% - put 'rng' in outer loop - put 'rng' before outer loop ( We have % different random generation in every calling of cost functions in both % above options but same in loops ) % % - put 'rng' in outer loop with same generation in every calling of cost % function. - put 'rng' in befor outer loop with same generation in every % calling of cost function. ( We have same random generation in every % calling of cost functions in both above options ) % % - Remove 'rng' from code and initialize weights in (position 3) - Remove % 'rng' from code and initialize weights in (position 2) - Remove 'rng' % from code and initialize weights in (position 1) (+ different position % (1,2,3) for `randperm` of index (train/test/validation) ).
I didn't even read what you wrote above because the answer is very simple:
Initialize the RNG once and only once before any loops and the first use of any random function.
% Sorry for these long questions.
You shouldn't be. If it were not for you, I would not have realized that specifying MSEgoal should not do anything if the performance function is Xent.
Greg
Jack on 7 Sep 2014
Thank you so much Greg for your helps.

orlem lima dos santos on 26 Jan 2018
There is a algorithm known as grid search that can find the solution for what you want
you can find one implementation made by caghangir in the link below
https://www.mathworks.com/matlabcentral/fileexchange/63132-grid-search-function-for-neural-networks
This algorithm perfoms a 10-fold cross-validation.

### Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!