How to change a subset of ANN weights while keep others weights unchanged?

9 views (last 30 days)
Hello folks,
I am using the neural network toolbox 2012a in my project. I have created a feed-forward-net with 2 layers(inputs are not counted as a layer as conventionalized in the users' guide), and I want to update some of the input weights (IW{1,1}) while keep other input weights in IW{1,1} and the first-to-second-layer weights(LW{2,1}) fixed. To be short, I want to change a subset of IW{1,1} while remain all the other weights fixed. Let me refer this as my optimal goal here.
If the optimal goal is impossible, a sub-optimal goal is also acceptable. That is,update the entire IW{1,1} and keep the whole LW{2,1} fixed.
I already figured out how to achieve the sub-optimal goal. My solution is to use the command 'adapt' and set the learning rate to 0 for LW{2,1}. But I do not like this solution since 'adapt' is an over-simplified function lacking parameters and features(eg. min-grad, plotperform, etc.) of other training functions/algorithms(eg. trainlm,traingd,etc.) Therefore it is harder to control the training process and check on the results.
So, first, I want to know if there is a way to achieve the optimal goal rather than the sub-optimal.
Second, if the optimal goal is not possible (besides composing everything from scratch), I wonder if I can achieve the sub-optimal goal by taking advantage of some training functions instead of using 'adpat'. I have already looked through 'trainlm' and 'traingd' but I do not think they are helpful to either of my goals.
I will really appreciate it if anyone can help me with this issue.
Jason Lee

Accepted Answer

Greg Heath
Greg Heath on 9 Nov 2012
First, let me clarify my train of thought. I was comparing training continuously using net.trainParam.epochs = 100 with training 10 consecutive times in a loop using net.trainParam.epochs = 10 ( or, say, 100 consecutive times in a loop using net.trainParam.epochs = 1). To eliminate complications, do not train with a validation set. For example, train candidates using net.divideFcn = ''. Then use a holdout validation set to choose the best designs.
There is a way to obtain the same result ( I am pretty sure that I did it yrs ago with the 2004 MATLAB 6.5 version of NEWFF). Given the same initial weights at epoch 0, the results will be the same at epoch 10. However, when the second example starts the 11th epoch, it has to call TRAIN again. When TRAIN starts again, it is not in the same state that it would have been in the 11th epoch of the continuous training example.
The task then is to quantify the state of TRAIN at the close of epoch 10 and to guarantee that it is in that state after it is called at the beginning of epoch 11.
Extending this strategy you can interrupt training at any time and assign your specified weights. However, now I understand that you would like some of those weights to remain fixed throughout further training.
Currently, the only way to do that is to keep assigning that same fixed weight thoughout traiing. Whether the assignments are made every epoch or every few epochs would have to be determined by trial and error.
I have performed 40 experiments using MATLAB's simplefit_dataset. There were 10 random weight initializations of 1-4-1 nets for each of the following 4 scenarios:
1. NEWFIT (calls NEWFF) continuous training with the default net.trainParam.epochs = 1000
2. NEWFIT WHILE-LOOP training with net.trainParam.epochs = 1
3. FITNET (calls FEEDFORWARDNET) continuous.
4. FITNET WHILE-LOOP
The 4 MSE results for each of the 10 random weight initializations were in agreement. However, I have not yet compared final weights.
In order to further understand the problem I may obtain 1-3-1 designs to get a wider scatter of results.
Hope this helps.
Thank you for formally accepting my answer.
Greg
  3 Comments
jason
jason on 9 Nov 2012
Thank you Greg.
"Why would you want to do what you propose?" Well, this is part of my master's thesis and it would be too long to explain or the details. To be short, some of the weights are actually design parameters(I know this may look weird, imagine in circuit design, these weights would be the circuit elements to be tuned). In a design task, we may want to fix some of the parameters while tuning others to meet a specific requirement. That's why I want to do what I proposed.
"I guess another possibility is to write your own weight update code that uses different learning rates for different weights." Actually I have done this already, but it turns out to be very slow when the problem size is big. This is why I want to find a better way to do it. I think if I can take more advantage of the built in functions, then my program is more likely going to be faster, since those built in functions are written by expert programmers, while I am only a hobbyist programmer.
"Currently, the only way to do that is to keep assigning that same fixed weight though out training." Actually, I found out that I can assign different learning rates for weights in different layers if I use command "adapt" instead of "train". But it takes way longer than I expected; actually it took almost the same amount of time as the code I wrote from scratch. I also have tried "assigning that same fixed weight though out training every few epochs" idea, but the error is not good at all. I am going to try "make the assignment every single epoch" now. For BP algorithm I think it will work, but I need to try to see how efficient it can be. For other algorithms, like LM etc., to be honest I do not know how they work in detail, so I cannot do much analytical work more than trail and error.
"In order to further understand the problem I may obtain 1-3-1 designs to get a wider scatter of results." My apology here, I said "2 layer network" at first but I think it is better to think of a 1-3-3-1 network. Because theoretically if the weights between 1-3 are fixed, it is still possible to train the network to get arbitrarily small error.
Greg
jason
jason on 9 Nov 2012
I mistook myself as you perhaps because I quoted too many of your words.lol

Sign in to comment.

More Answers (7)

Jai
Jai on 7 Jul 2016
You can use net.biases{i}.learn=0, net.inputWeights{i,j}.learn=0, To fix some of the weights.

Greg Heath
Greg Heath on 1 Nov 2012
You can directly assign any combination of weights that you want after the call of the osolete functions newpr, newfit or newff. However, if you use the updated functions patternnet, fitnet or feedforwardnet, you have to first call configure, init or train.
net.IW{:,:} = IW;
net.LW{:,:} = LW;
net.b{:,:} = b;
Hope this helps.
Thank you for formally accepting my answer.
Greg

jason
jason on 6 Nov 2012
Hi Greg,
Thank you very much for your answer.
I use ANN for function fitting rather than pattern recognition, and I prefer 'feedforwardnet' and 'train' because I know how to use these commands at least in the common ways.
Unfortunately, from your answer I still cannot see how to update a set of the weights while fixing others during training. Let's say, if I use 'feedforwardnet' and 'train', then configure would be unnecessary. Besides, even if I do configure and assign net.IW{:,:} net.LW{:,:} and net.b{:,:} first, then 'train' will update all the weights and biases which is not what I want. It is also possible that 'feedforwardnet' and 'train' are not suitable for this task, but I don't know what other commands I can use now.
So could you please make it clearer? Show me how to use 'feedforwardnet' and 'train' for my task, or other commands. Just one way to do it would be sufficient.
Best,
Jason
  1 Comment
Greg Heath
Greg Heath on 7 Nov 2012
PATTERNET was explicitly designed for classification and pattern recognition.
FITNET was explicitly designed for regression and curvefitting.
BOTH call FEEDFORWARDNET.
If you compare source codes via
type fitnet
type feedforwardnet
you will see that the only difference is that fitnet automatically uses PLOTFIT whereas feedforwardnet does not.
So, if you want to use feedforwardnet, you have to explicitly call plotfit afterward as demonstrated in
help plotfit
Greg

Sign in to comment.


Greg Heath
Greg Heath on 8 Nov 2012
I guess I do not understand exactly what you want to do. My original point was that if you use one of the obsolete fuctions, you can change the value of any combination of weights before or during training and then continue training.
However, if you use one of the current functions,
1. You have to use configure or init if you want to use a specific subset of initial weights. Direct assignment is not allowed before configure, init or train is called.
2. If you want to interrupt training, specify a specific subset of weights and then continue training, training will not continue smoothly from where you interrupted. Instead, your training parameters will be automatically reinitialized.
To make it clearer, suppose you wanted to interrupt training, then do nothing before continuing to train. You will end up with a different result than if you trained continuously. In particular
net.trainParam.epoch = 10;
rng(0)
for i = 1:10
[net tr ] = train(net,x,t);
end
will have a different result than
net.trainParam.epoch = 100;
rng(0)
[net tr ] = train( net, x, t);
If you can figure out how to obtain the same results, it is worth starting a new thread to share the discovery.
Greg
  1 Comment
jason
jason on 9 Nov 2012
I am sorry I made it confusing to you.
Please allow me to clarify my problem here by using your example. I want to train a network with inputs x and targets t, but I want to keep some of the network weights(imagine the input weights) unchanged while updating others(imagine weights between layers). Note that the ultimate trained network should be able to represent the input-output relationship, which is the mapping from x to t here.
As for your example, I think the result will be different because 'net.trainParam.epoch' is not the exact number of epochs the program will run; instead, it is indeed the max number of epochs used to control the termination of the training. That is, if you set net.trainParam.epoch = 10, the program will run at most 10 epochs of training, but if the error is small enough at, say epoch 6, then the training will be terminated at epoch 6 instead of 10. Therefore, in your example, chances are that the two cases may have run different epochs and thus certainly have different results.
But I cannot see how this is related to my problem.
If I just write [net tr ] = train( net, x, t); then all the weights of my network will be updated, which is obviously not what I want. I guess your point is that, after training ([net tr ] = train( net, x, t);), I can simply change back the weights which I want to fix to their original values (e.g. if I want to fix the input weights, then after training I should write net.IW{1,1} = original_value; where original_value has stored the original values before training). However I do not think this strategy will work. Because afterward, given x as inputs, the net's output might be very far from t.
Hopefully I have made it clear this time; if not, please feel free to let me know. I look forward to your response and I really appreciate your help.
Thank you
Jason

Sign in to comment.


Greg Heath
Greg Heath on 10 Nov 2012
I don't know how many times you want to change the first layer weights during training. However, if you have 2 hidden layers and want to fix the first layer of weights, you can switch between that net and a double net configuration:
1. Use [x;t] to train net1 I-H1-H2-O
h1 = ...
h2 = ...
y = ...
2. Initialize net2 I-H1 and net3 H1-H2-O with weights from net1
3. Use net2 to create the new input matrix h1 = tansig(b1+IW*x)
4. Use [h1;t] to train net3. Since it has a hidden layer, it is a universal approximator.
5. Use the weights from net3 to intialize the last 2 layers of net1
6. etc
The fact that retraining net1 and/or net3 reinitializes the state of TRAIN is not a problem.
If your data set is not large, your toughest problem may be choosing a suitable pair of values for the number of hidden nodes H1 and H2 to prevent overtraining an overfit net ( Number of training equations is not sufficiently larger than the number of unknown weights).
Hope this helps.
Thank you for formally accepting my answer.
Greg
  5 Comments
Greg Heath
Greg Heath on 13 Nov 2012
>Thank you Greg. This is a good trick. But this cannot be used in the 2 cases below.
Easily Modified:
>1. If I do not want to fix the first layer weights completely, but only some of them. That is, I also want to fix some elements of IW{1,1} while changing other elements of IW{1,1} and elements of LW.
This can be accomplished by changing the weights of the first net and generating a new h1.
>2. If I want to do the opposite, that is, fix the layer weights LW while changing the input weights IW{1,1}.
In this case you can use pseudo-inversion to obtain h2 from b2+LW2*h2 = t

Sign in to comment.


Greg Heath
Greg Heath on 13 Nov 2012
I have performed 40 experiments using MATLAB's simplefit_dataset. There were 10 random weight initializations of 1-4-1 nets for each of the following 4 scenarios:
1. NEWFIT (calls NEWFF) continuous training with the default net.trainParam.epochs = 1000
2. NEWFIT WHILE-LOOP training with net.trainParam.epochs = 1
3. FITNET (calls FEEDFORWARDNET) continuous.
4. FITNET WHILE-LOOP
The 4 MSE results for each of the 10 random weight initializations were in agreement.
This is because the first 9 initializations achieved the training goal of R2trna >= 0.99 where R2trna is the adjusted coefficient of determination (AKA degree-of-freedom adjusted R^2 ... see Wikipedia). The last initializations terminated before reaching the goal because the specified minimum gradient of MSE (1e-10) was reached.
However, when the weights of the continuous and interupted training designs are compared, only 50% of the designs achieved the same weights.
I do not intend to pursue the reason why the other 50% did not beyond looking at the 1-3-1 case where many of the designs did not achieve the training goal.

Greg Heath
Greg Heath on 13 Nov 2012
These are the results using FITNET for the 1-3-1 design.
1. All 20 cases terminated via tr.stop = 'Minimum gradient reached.' before achieving the goal of R2trna >= 0.99.
2. Continuous training took 6.8 sec, interrupted training took 32.0 sec
3.The differences in tr.mu were either 1e-3 or 0.9999e-3.
4. The differences in R2trn and R2trna were less than 1e-8.
5. The differences in R2val and R2tst were less than 1e-5.
6. The differences in number of epochs were
dNepochs = -5 0 -5 -4 0 -3 -3 0 -6 -5
7. Nevertheless, in both cases runs 2-5 and 7-9 obtained the EXACT same set of weights. In run 1 there was a sign change in IW(2), b1(2) and LW(2) which caused no change in output because the hidden node activation has odd parity. Adjusting for these 3 sign changes (*), the differences between the continous and interupted training weight estimates were
dWB(: , [1 2 6 10 ] ) =
1 [2-5,7-9] 6 10
=======================================
0.0017 -0.0000 0 -0.0007
*0.0001 -0.0000 0.0001 -0.0022
-0.0074 0.0414 1.0558 -0.0015
0.0014 0.0000 0 0.0003
*0.0000 0.0000 -0.0000 -0.0011
-0.0071 0.0387 0.7752 -0.0013
0.0002 0.0000 0.2949 -0.0000
*0.0000 -0.0000 0.0000 -0.0000
0.0000 -0.0002 -0.5898 0.0001
-0.0002 -0.0002 -0.2949 0.0001

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!