Scaled Conjugate Gradient - NN toolbox

2 views (last 30 days)
Hi,
I have used MATLAB's 'trainscg' with 'mse' as the performance function and NETLAB's 'scg' with 'mse' as the performance function for the same training data set and still don't obtain the same generalisation on a set of other data files I have.
I have used same the same Nguyen Widrow initialisation method for weight and bias initialisation. Used the same 'dividerand' method to split the data sets into training, validation and testing data.
I know the difference could be in the various parameters used. In the original paper, http://www.sciencedirect.com/science/article/pii/S0893608005800565; the lambda values are specified not as exact values but as inequalities. I have used values that don't violate the rules laid down by the author.
Also, one thing that seems a bit bizarre to me is that MATLAB stops the learning in just 23 epochs but NETLAB exceeds maximum iterations. I understand stopping criteria may be different.
Is there anyone there who has worked on both of these toolboxes and found a way of establishing same results from both of them? I want some general ideas and tips to making SCG give similar results to MATLAB's TRAINSCG.
Any help, advise will be greatly appreciated.
Thank you. Pooja
  1 Comment
Pooja Narayan
Pooja Narayan on 12 Aug 2014
About the network I'm trying to learn itself - Input layers are 1039 in number. I use only one hidden layer and one output layer. I have two biases one for hidden and one for output layer and one other weight for the only layer I have. Thus I'm trying to learn 1042 weights. I use tangent sigmoid as my hidden layer TF and log sigmoid as my output layer TF. Have only two classes to classify into.

Sign in to comment.

Accepted Answer

Greg Heath
Greg Heath on 12 Aug 2014
Your description is incorrect and confusing.
[I N ] = size(input) % = ?
[ O N ] = size(target) % = ?
Ntrn = ? % Matlab default = N-2*round(0.15*N)
Ntrneq = Ntrn*O % Number of training equations
For an I-H-O net, the number of unknown weights to be estimated is
Nw = (I+1)*H+(H+1)*O % The "1s" are for biases
To prevent overfitting choose H so that Ntrneq >= Nw.
To prevent nonrobustness w.r.t. noise and interference choose Ntrneq >> Nw
Otherwise use regularization (trainbr or msereg) or validation subset stopping.
Nw can be lowered by removing input and/or hidden nodes.
I assume you mean you have 1039 input NODES. I doubt if you need that many. You should probably use input variable reduction (e.g., help PLSregress) to obtain a more reasonable number.
Ntrneq >> Nw == H << Hub = -1 +ceil((Ntrneq-O)/(I+O+1))
Need to know N, Ntrn and H. Need to reduce I.
Hope this helps.
Thank you for formally accepting my answer
Greg
  1 Comment
Pooja Narayan
Pooja Narayan on 7 May 2015
Sorry for posting a confusing description sir.
Yes, you are right. I have 1039 inputs. Derived from the spectral densities of various signals. I have a 1039 - 1 - 1 network. I have thus 1039 input layer weights; one bias that adds up at the hidden layer and a layer weight for the output layer and then a final bias that adds up to the output before I take the exponential(logsig).
I have an input matrix of 5000 samples. The samples are arranged in columns. I thus have 5000 targets.
SCG is a batch training algorithm. Thus for let's say 1000 cycles, I'm spanning the entire data set of 5000 samples (in each cycle). At the hidden layer, I sum ,hout = tansig(features * input weights + bias1). At my output layer, I take the logsig such as finalout = logsig(hout + w2). I don't see anything wrong in this.
I'm interested in those "under the hood" operations MATLAB might be doing to improve generalisation performance.
I have implemented an SCG based on Moller's original paper. I however, never get the generalisation that MATLAB's 'SCG' gives.
Yes, as you have mentioned, I stop the learning when the validation error stops increasing. I'm not using minimum gradient criteria. while MATLAB may be using that.

Sign in to comment.

More Answers (1)

saba momeni
saba momeni on 1 Feb 2019
Hi everyone
I am training my feedfoward neural network. with scale conjugate gradient.
I am not sure that scale conjugate gradient dose optimization in bach or with mini-batch training?
I just specify the Lambada and the Sigma for it , no size of batch.
I appreciate your answer.
Cheers
S

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!