optimization of matrices with random initialization

Ciao, everyone
I want to optimize a function
where pi are optimal variables and matrices of different sizes. (each pi_bar have the same size as corresponding pi)
I reshape all the pi to one single column so I could use the fminunc to solve the problem.
The problem is unconstraint but yi is updated using pi. It also involves some random initialization (using randn) at the beginning for some variables.
pi_bar and yi_bar are already known.
case 1: I run the optimization, it returns different values each time, which is understandable as there is random inialization in the algorithm
case 2: I fixed the random initialization, using rng for example. It returns an error "maximum number of function evaluations has been exceeded " even if I set the value to very high (500000).
It seems that the algorithm only finds a better point when there is better random initialization points. Is there a better way to cope with the random initialization in optimization problem? And what could be the reasons for the case 2?
Thanks a lot for any suggestion in advance!

 Accepted Answer

Make sure your objective function code does not contain any randomization steps. Your initial guess can be random, but the objective function itself needs to be deterministic. Aside from that, nothing can be diagnosed without seeing your code.

8 Comments

I used rng to initialize , thus it is random but deterministic
That's really not a good place for it, and definitely explains the first issue (case 1) that you mentioned in your post. All randomization should take place outside your objective function and prior to invoking the optimization.
LSTM_layer and FC_layer are just two functions that update
The meat of your objective is hidden from us, so there's not much we can say. You also haven't shown how you invoke the solver and with what initial guesses for x1...x8.
Basically, though, you have an objective function with arbitrarily chosen fixed parameters xp which could make the optimization very ill-posed. If your initial guess is similarly arbitrary, you could also be very far away from the solution. Both are reasons to think convergence could take a very long time and lots of iterations.
Thanks for the answer, I will put the initialization of xp before invoking the optimization process
This is the LSTM_layer
function [y,xp] = LSTM_layer(u,x0,p_Wfio,p_Ufio,p_bfio,p_Wc,p_Uc,p_bc)
h0 = x0(:,1:10);
xi0 = x0(:,11:20);
a = sigmoid_(u*p_Wfio+ xi0*p_Ufio + p_bfio);
f = a(:,1:10);
i = a(:,11:20);
o = a(:,21:30);
c = tanh(u*p_Wc+ xi0*p_Uc + p_bc);
hp = f.*h0+i.*c;
xip = o.*tanh(hp);
xp = [hp,xip];
y = xi0;
end
and this is FC_layer
function y =FC_layer(y,p_weight,p_bias)
y = y*p_weight' +p_bias;
end
Both works the same as the LSTM layer and fully-connected layer in RNN.
I use fminunc and with inital guess x0 () which are the trained parameters from a LSTM network (already trained on a large dataset)
x = fminunc(fun,x0,options)
If I understand you correctly, the major problem here would be the intial guess of xp.
If I understand you correctly, the major problem here would be the intial guess of xp.
Well, at the very least I don't understand why it's random. If you need to "guess" xp, why isn't it being treated as an unknown parameter?
xp are (hidden and cell) states in LSTM networks. Normally they are initialized to zero or randomly with zero mean when training the LSTM. They are supposed to store the information of the past data points (from the same sequence). And for a new sequence,there is no "past information". That's why I follow the practice from training LSTM to initialize it randomly.
Well, it is confusing to me that you would conduct training iterations inside your objective function. The whole purpose of using fminunc here (or so I thought) is so that fminunc would do the iterative parameter estimation for you. It's as if you have an optimization inside an optimization.
Sorry I did not explain it clearly. The loop inside the objective function is for formulating the second part of the objective function
which is a sequence of 1000 data points.
That doesn't clarify why you are not optimizing xp. Surely the accuracy of the prediction of depends jointly on xp and the other parameters. If you change x1...x8, your prediction can become worse if you don't change xp as well.
Also, you say that your initial guess of x1...x8 came from a previously trained LSTM. Why not use the xp from that network as well (regardless of whether xp is treated as an unknown or not)?
Thanks for the comment!
I think treating xp as optimal variables is a nice idea to eliminate the randomness in the process. Although it returns the same error "exceeds max number of function evaluations". I think the initial guess of the optimal variables are still too far away from optimal solutions. If I choose less variables to optimize (e.g. only x1 and x2), the algorithm works fine.

Sign in to comment.

More Answers (1)

You would probably do well to use the Problem-Based Optimization Workflow. But you can just as easily change your current solution method to use a more efficient algorithm. The point is that lsqnonlin is the solver of choice for sum-of-squares problems. Your objective function should return the and lsqnonlin implicitly sums the squares and minimizes.
That said, I might be misunderstanding your problem. You said that your are functions of , and I do noot see that connection in your problem formulation. So I might have it wrong somehow.
In any case, see whether the problem-based formulation makes sense for you and whether it chooses a more efficient solver.
Alan Weiss
MATLAB mathematical toolbox documentation

1 Comment

Hi Alan
thanks a lot for the answer!
Sorry I did not state it clearly. The optimization variables are only in my algorithm. I am actually using fminunc to solve my optimization. I also have tried problem-based optimization and it automatically chooses fminunc as the solver. And it returns the exact same error.
In my problem formulation, is actually a sequence and is calculated through and some variables (lets say ).
If is randomly initialized, the algorithm only finds a better point if there is a good random initialization.
I initialize the optimal variable with the previous best result (according to my algorithm, it is the best initializaton for ).
For, I used a good guess (at least in my opinion) to initialize it.
But it still throws the error "fminunc stopped because it exceeded the function evaluation limit"
I am new to optimization algorithm. Is it because that there is no available (sub)optimal points near the initial point of ? Or the algorithm is too conservative to take large steps that it is trapped?
I also would like to add, it is a relatively big optimization problem with 8 different optimization variables (all are matrices with different size) and together 724 scalar optimization variable. Is fminunc suitable for problems with large number of optimization variables?
I would appreciate any suggestions!
Jing

Sign in to comment.

Products

Release

R2021a

Asked:

on 19 Oct 2021

Commented:

on 21 Oct 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!