How to change a subset of ANN weights while keep others weights unchanged?

Question

0 votes

Hello folks,

I am using the neural network toolbox 2012a in my project. I have created a feed-forward-net with 2 layers(inputs are not counted as a layer as conventionalized in the users' guide), and I want to update some of the input weights (IW{1,1}) while keep other input weights in IW{1,1} and the first-to-second-layer weights(LW{2,1}) fixed. To be short, I want to change a subset of IW{1,1} while remain all the other weights fixed. Let me refer this as my optimal goal here.

If the optimal goal is impossible, a sub-optimal goal is also acceptable. That is,update the entire IW{1,1} and keep the whole LW{2,1} fixed.

I already figured out how to achieve the sub-optimal goal. My solution is to use the command 'adapt' and set the learning rate to 0 for LW{2,1}. But I do not like this solution since 'adapt' is an over-simplified function lacking parameters and features(eg. min-grad, plotperform, etc.) of other training functions/algorithms(eg. trainlm,traingd,etc.) Therefore it is harder to control the training process and check on the results.

So, first, I want to know if there is a way to achieve the optimal goal rather than the sub-optimal.

Second, if the optimal goal is not possible (besides composing everything from scratch), I wonder if I can achieve the sub-optimal goal by taking advantage of some training functions instead of using 'adpat'. I have already looked through 'trainlm' and 'traingd' but I do not think they are helpful to either of my goals.

I will really appreciate it if anyone can help me with this issue.

Jason Lee

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Greg Heath on 9 Nov 2012

0 votes

First, let me clarify my train of thought. I was comparing training continuously using net.trainParam.epochs = 100 with training 10 consecutive times in a loop using net.trainParam.epochs = 10 ( or, say, 100 consecutive times in a loop using net.trainParam.epochs = 1). To eliminate complications, do not train with a validation set. For example, train candidates using net.divideFcn = ''. Then use a holdout validation set to choose the best designs.

There is a way to obtain the same result ( I am pretty sure that I did it yrs ago with the 2004 MATLAB 6.5 version of NEWFF). Given the same initial weights at epoch 0, the results will be the same at epoch 10. However, when the second example starts the 11th epoch, it has to call TRAIN again. When TRAIN starts again, it is not in the same state that it would have been in the 11th epoch of the continuous training example.

The task then is to quantify the state of TRAIN at the close of epoch 10 and to guarantee that it is in that state after it is called at the beginning of epoch 11.

Extending this strategy you can interrupt training at any time and assign your specified weights. However, now I understand that you would like some of those weights to remain fixed throughout further training.

Currently, the only way to do that is to keep assigning that same fixed weight thoughout traiing. Whether the assignments are made every epoch or every few epochs would have to be determined by trial and error.

I have performed 40 experiments using MATLAB's simplefit_dataset. There were 10 random weight initializations of 1-4-1 nets for each of the following 4 scenarios:

1. NEWFIT (calls NEWFF) continuous training with the default net.trainParam.epochs = 1000

2. NEWFIT WHILE-LOOP training with net.trainParam.epochs = 1

3. FITNET (calls FEEDFORWARDNET) continuous.

4. FITNET WHILE-LOOP

The 4 MSE results for each of the 10 random weight initializations were in agreement. However, I have not yet compared final weights.

In order to further understand the problem I may obtain 1-3-1 designs to get a wider scatter of results.

Hope this helps.

Thank you for formally accepting my answer.

Greg

3 Comments
Show 1 older comment Hide 1 older comment

jason on 9 Nov 2012

Thank you Greg.

"Why would you want to do what you propose?" Well, this is part of my master's thesis and it would be too long to explain or the details. To be short, some of the weights are actually design parameters(I know this may look weird, imagine in circuit design, these weights would be the circuit elements to be tuned). In a design task, we may want to fix some of the parameters while tuning others to meet a specific requirement. That's why I want to do what I proposed.

"I guess another possibility is to write your own weight update code that uses different learning rates for different weights." Actually I have done this already, but it turns out to be very slow when the problem size is big. This is why I want to find a better way to do it. I think if I can take more advantage of the built in functions, then my program is more likely going to be faster, since those built in functions are written by expert programmers, while I am only a hobbyist programmer.

"Currently, the only way to do that is to keep assigning that same fixed weight though out training." Actually, I found out that I can assign different learning rates for weights in different layers if I use command "adapt" instead of "train". But it takes way longer than I expected; actually it took almost the same amount of time as the code I wrote from scratch. I also have tried "assigning that same fixed weight though out training every few epochs" idea, but the error is not good at all. I am going to try "make the assignment every single epoch" now. For BP algorithm I think it will work, but I need to try to see how efficient it can be. For other algorithms, like LM etc., to be honest I do not know how they work in detail, so I cannot do much analytical work more than trail and error.

"In order to further understand the problem I may obtain 1-3-1 designs to get a wider scatter of results." My apology here, I said "2 layer network" at first but I think it is better to think of a 1-3-3-1 network. Because theoretically if the weights between 1-3 are fixed, it is still possible to train the network to get arbitrarily small error.

Greg

jason on 9 Nov 2012

I mistook myself as you perhaps because I quoted too many of your words.lol

Sign in to comment.

Answer 2

Jai on 7 Jul 2016

3 votes

You can use net.biases{i}.learn=0, net.inputWeights{i,j}.learn=0, To fix some of the weights.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Answer 3

Greg Heath on 1 Nov 2012

0 votes

You can directly assign any combination of weights that you want after the call of the osolete functions newpr, newfit or newff. However, if you use the updated functions patternnet, fitnet or feedforwardnet, you have to first call configure, init or train.

net.IW{:,:} = IW;

net.LW{:,:} = LW;

net.b{:,:} = b;

Hope this helps.

Thank you for formally accepting my answer.

Greg

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Answer 4

jason on 6 Nov 2012

0 votes

Hi Greg,

Thank you very much for your answer.

I use ANN for function fitting rather than pattern recognition, and I prefer 'feedforwardnet' and 'train' because I know how to use these commands at least in the common ways.

Unfortunately, from your answer I still cannot see how to update a set of the weights while fixing others during training. Let's say, if I use 'feedforwardnet' and 'train', then configure would be unnecessary. Besides, even if I do configure and assign net.IW{:,:} net.LW{:,:} and net.b{:,:} first, then 'train' will update all the weights and biases which is not what I want. It is also possible that 'feedforwardnet' and 'train' are not suitable for this task, but I don't know what other commands I can use now.

So could you please make it clearer? Show me how to use 'feedforwardnet' and 'train' for my task, or other commands. Just one way to do it would be sufficient.

Best,

Jason

1 Comment
Show -1 older comments Hide -1 older comments

Greg Heath on 7 Nov 2012

PATTERNET was explicitly designed for classification and pattern recognition.

FITNET was explicitly designed for regression and curvefitting.

BOTH call FEEDFORWARDNET.

If you compare source codes via

type fitnet

type feedforwardnet

you will see that the only difference is that fitnet automatically uses PLOTFIT whereas feedforwardnet does not.

So, if you want to use feedforwardnet, you have to explicitly call plotfit afterward as demonstrated in

help plotfit

Greg

Sign in to comment.

Answer 5

Greg Heath on 8 Nov 2012

Open in MATLAB Online

0 votes

I guess I do not understand exactly what you want to do. My original point was that if you use one of the obsolete fuctions, you can change the value of any combination of weights before or during training and then continue training.

However, if you use one of the current functions,

1. You have to use configure or init if you want to use a specific subset of initial weights. Direct assignment is not allowed before configure, init or train is called.

2. If you want to interrupt training, specify a specific subset of weights and then continue training, training will not continue smoothly from where you interrupted. Instead, your training parameters will be automatically reinitialized.

To make it clearer, suppose you wanted to interrupt training, then do nothing before continuing to train. You will end up with a different result than if you trained continuously. In particular

net.trainParam.epoch = 10;

rng(0)

for i = 1:10

[net tr ] = train(net,x,t);

end

will have a different result than

net.trainParam.epoch = 100;

rng(0)

[net tr ] = train( net, x, t);

If you can figure out how to obtain the same results, it is worth starting a new thread to share the discovery.

Greg

1 Comment
Show -1 older comments Hide -1 older comments

jason on 9 Nov 2012

I am sorry I made it confusing to you.

Please allow me to clarify my problem here by using your example. I want to train a network with inputs x and targets t, but I want to keep some of the network weights(imagine the input weights) unchanged while updating others(imagine weights between layers). Note that the ultimate trained network should be able to represent the input-output relationship, which is the mapping from x to t here.

As for your example, I think the result will be different because 'net.trainParam.epoch' is not the exact number of epochs the program will run; instead, it is indeed the max number of epochs used to control the termination of the training. That is, if you set net.trainParam.epoch = 10, the program will run at most 10 epochs of training, but if the error is small enough at, say epoch 6, then the training will be terminated at epoch 6 instead of 10. Therefore, in your example, chances are that the two cases may have run different epochs and thus certainly have different results.

But I cannot see how this is related to my problem.

If I just write [net tr ] = train( net, x, t); then all the weights of my network will be updated, which is obviously not what I want. I guess your point is that, after training ([net tr ] = train( net, x, t);), I can simply change back the weights which I want to fix to their original values (e.g. if I want to fix the input weights, then after training I should write net.IW{1,1} = original_value; where original_value has stored the original values before training). However I do not think this strategy will work. Because afterward, given x as inputs, the net's output might be very far from t.

Hopefully I have made it clear this time; if not, please feel free to let me know. I look forward to your response and I really appreciate your help.

Thank you

Jason

Sign in to comment.

Answer 6

Greg Heath on 10 Nov 2012

0 votes

I don't know how many times you want to change the first layer weights during training. However, if you have 2 hidden layers and want to fix the first layer of weights, you can switch between that net and a double net configuration:

1. Use [x;t] to train net1 I-H1-H2-O

h1 = ...

h2 = ...

y = ...

2. Initialize net2 I-H1 and net3 H1-H2-O with weights from net1

3. Use net2 to create the new input matrix h1 = tansig(b1+IW*x)

4. Use [h1;t] to train net3. Since it has a hidden layer, it is a universal approximator.

5. Use the weights from net3 to intialize the last 2 layers of net1

6. etc

The fact that retraining net1 and/or net3 reinitializes the state of TRAIN is not a problem.

If your data set is not large, your toughest problem may be choosing a suitable pair of values for the number of hidden nodes H1 and H2 to prevent overtraining an overfit net ( Number of training equations is not sufficiently larger than the number of unknown weights).

Hope this helps.

Thank you for formally accepting my answer.

Greg

5 Comments
Show 3 older comments Hide 3 older comments

jason on 11 Nov 2012

I think I know how to get the same result now, at least partially.

... % Set up and define x t as inputs and targets

net=feedforwardnet(3); % You can try more hidden neurons. I also have tried 5 and 8 neurons.

nnet = net; % Define another network for comparison

net.trainFcn='traingd'; % I'm pretty familiar with gradient descent algorithm, but not familiar with LM algorithm. I think the idea is the same

net.divideFcn = ''; % This is very important. We do not want the program to divide the data randomly every time when going through the "for" loop. Just train with all the data without validation here.

net.trainParam.epochs = 1; % You may want to set it bigger, however that will be problematic. Because this parameter is not how many epochs it will train, but the max epochs(higher bound) it's gonna train. Suppose you set it to 10, however if at epoch 6 the gradient magnitude is smaller than 10e-5, then it will stop and the network will have been trained only 6 epochs. So if you set it bigger, then the network may be trained by different number of epochs every time going through the "for" loop. The exact number of epochs will depend on the problem specific (the specifics of x and t), and weights initial condition and network size. So it will be more difficult to track.

rng(5) % You can try other seeds. I have tried seeds 0-6, 15-18, the results are identical all the time.

for k =1:2 % Again you may want to have more iterations here. But that may cause problem when training nnet later, because you will have to set "nnet.trainParam.epochs" as big. Luckily I got the same result for "k=1:5" and "nnet.trainParam.epochs=5" several times, but there is no way it works for all the cases, because nnet's training may be terminated before reaching epoch 5.

net = train(net,x,t);

end

nnet.trainFcn='traingd'; % same setup

nnet.divideFcn = '';

nnet.trainParam.epochs = 2;

rng(5)

nnet = train(nnet,x,t);

net.iw{1,1}==nnet.iw{1,1} % output 1's if the weights are the same

net.lw{2,1}==nnet.lw{2,1}

It is of course possible to get the same result when including those factors. For example, you can divide data randomly but with the same seed every time, but then you will also need to concern the validation issue, because "Maximum Number of Validation Increases" will also affect the result and thus you have to set "trainParam.max_fail" properly. Also, if the "Maximum Number of Validation Increases" is 6, the program will return the network 6 epochs before rather than the current network, which can also give you a headache. Another thing is you can set "net.trainParam.epochs" greater and record "tr.num_epochs" every time when going through the "for" loop. Then sum up all the recorded epochs, which is the total number of epochs "net" has been trained. Then set it to the "nnet.trainParam.epochs". I did not try this but I think chances are you need to make some adjustments also.

Greg Heath on 13 Nov 2012

>Thank you Greg. This is a good trick. But this cannot be used in the 2 cases below.

Easily Modified:

>1. If I do not want to fix the first layer weights completely, but only some of them. That is, I also want to fix some elements of IW{1,1} while changing other elements of IW{1,1} and elements of LW.

This can be accomplished by changing the weights of the first net and generating a new h1.

>2. If I want to do the opposite, that is, fix the layer weights LW while changing the input weights IW{1,1}.

In this case you can use pseudo-inversion to obtain h2 from b2+LW2*h2 = t

jason on 14 Nov 2012

Thank you. I like your pseudo-inverse idea.

Sign in to comment.

Answer 7

Greg Heath on 13 Nov 2012

0 votes

I have performed 40 experiments using MATLAB's simplefit_dataset. There were 10 random weight initializations of 1-4-1 nets for each of the following 4 scenarios:

1. NEWFIT (calls NEWFF) continuous training with the default net.trainParam.epochs = 1000

2. NEWFIT WHILE-LOOP training with net.trainParam.epochs = 1

3. FITNET (calls FEEDFORWARDNET) continuous.

4. FITNET WHILE-LOOP

The 4 MSE results for each of the 10 random weight initializations were in agreement.

This is because the first 9 initializations achieved the training goal of R2trna >= 0.99 where R2trna is the adjusted coefficient of determination (AKA degree-of-freedom adjusted R^2 ... see Wikipedia). The last initializations terminated before reaching the goal because the specified minimum gradient of MSE (1e-10) was reached.

However, when the weights of the continuous and interupted training designs are compared, only 50% of the designs achieved the same weights.

I do not intend to pursue the reason why the other 50% did not beyond looking at the 1-3-1 case where many of the designs did not achieve the training goal.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Answer 8

Greg Heath on 13 Nov 2012

Open in MATLAB Online

0 votes

These are the results using FITNET for the 1-3-1 design.

1. All 20 cases terminated via tr.stop = 'Minimum gradient reached.' before achieving the goal of R2trna >= 0.99.

2. Continuous training took 6.8 sec, interrupted training took 32.0 sec

3.The differences in tr.mu were either 1e-3 or 0.9999e-3.

4. The differences in R2trn and R2trna were less than 1e-8.

5. The differences in R2val and R2tst were less than 1e-5.

6. The differences in number of epochs were

dNepochs = -5 0 -5 -4 0 -3 -3 0 -6 -5

7. Nevertheless, in both cases runs 2-5 and 7-9 obtained the EXACT same set of weights. In run 1 there was a sign change in IW(2), b1(2) and LW(2) which caused no change in output because the hidden node activation has odd parity. Adjusting for these 3 sign changes (*), the differences between the continous and interupted training weight estimates were

dWB(: , [1 2 6 10 ] ) =

         1   [2-5,7-9]       6        10
   =======================================   
    0.0017   -0.0000         0   -0.0007
   *0.0001   -0.0000    0.0001   -0.0022
   -0.0074    0.0414    1.0558   -0.0015
    0.0014    0.0000         0    0.0003
   *0.0000    0.0000   -0.0000   -0.0011
   -0.0071    0.0387    0.7752   -0.0013
    0.0002    0.0000    0.2949   -0.0000
   *0.0000   -0.0000    0.0000   -0.0000
    0.0000   -0.0002   -0.5898    0.0001
   -0.0002   -0.0002   -0.2949    0.0001

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

How to change a subset of ANN weights while keep others weights unchanged?

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

3 Comments
Show 1 older comment Hide 1 older comment

More Answers (7)

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments

1 Comment
Show -1 older comments Hide -1 older comments

1 Comment
Show -1 older comments Hide -1 older comments

5 Comments
Show 3 older comments Hide 3 older comments

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Products

Tags

Community Treasure Hunt

How to change a subset of ANN weights while keep others weights unchanged?

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

3 Comments Show 1 older comment Hide 1 older comment

More Answers (7)

0 Comments Show -2 older comments Hide -2 older comments

0 Comments Show -2 older comments Hide -2 older comments

1 Comment Show -1 older comments Hide -1 older comments

1 Comment Show -1 older comments Hide -1 older comments

5 Comments Show 3 older comments Hide 3 older comments

0 Comments Show -2 older comments Hide -2 older comments

0 Comments Show -2 older comments Hide -2 older comments

Categories

Products

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

3 Comments
Show 1 older comment Hide 1 older comment

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments

1 Comment
Show -1 older comments Hide -1 older comments

1 Comment
Show -1 older comments Hide -1 older comments

5 Comments
Show 3 older comments Hide 3 older comments

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments