Gradient Descent Implementation on a function, as opposed to an equation

Question

PASUNURU SAI VINEETH on 7 Sep 2022

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/1798675-gradient-descent-implementation-on-a-function-as-opposed-to-an-equation

Commented: PASUNURU SAI VINEETH on 10 Sep 2022

I have built a function

function [RMS] = ErrorFunction(ui,vi,wi,cx,cy,cz)

which outputs a certain error based on the six intial conditions I input to the model. Now, my model is iterative and the error depends on some intermediate parameters hence it is not possible to define a relationship between the error and six inputs. My aim is to minimize the error to zero using Gradient/Steepest Descent Method and I'm hoping somone would guide me in implementing it on a function, as opposed to a straightforward explicit relationship like, f(x,y) = 4x^2-4xy+2y^2

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Bjorn Gustavsson on 7 Sep 2022

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1798675-gradient-descent-implementation-on-a-function-as-opposed-to-an-equation#answer_1046370

Open in MATLAB Online

You could do something like this:

1, rewrite the function to take an array as input-parameter instead of a number of scalar parameters

function [RMS] = ErrorFunction([ui_vi_wi_cx_cy_cz])

2, calculate a numerical gradient (central difference):

curr_pars = [ui_vi_wi_cx_cy_cz]; % current point
h = 0.01; %  you have to decide what a suitably small step is, and if you need different sized steps for the different input variables...
for i_v = numel(curr_pars):-1:1
  cp = curr_pars;
  cp(i_v) = cp(i_v) + h;
  cm = curr_pars;
  cm(i_v) = cm(i_v) - h;
  gradErrorFunction(i_v) = (ErrorFunction(cp) - ErrorFunction(cm))/(2*h);
end

HTH

1 Comment
Show -1 older commentsHide -1 older comments

PASUNURU SAI VINEETH on 10 Sep 2022

Open in MATLAB Online

Thank you very much, that was brilliant! This approach suits my purpose.

Apologize for the late response, I've been trying out different combinations of the stepsize h and learning rate gamma before getting back to you.

clc
clear all
gamma = 2;
curr_pars = [1 1 1 1 1 1]; % current point
tic
for k = 1:100
    h = [1 1 1 1 1 1]; 
    for i_v = numel(curr_pars):-1:1
      cp = curr_pars;
      cp(i_v) = cp(i_v) + h(i_v);
      cm = curr_pars;
      cm(i_v) = cm(i_v) - h(i_v);
      gradErrorFunction(i_v) = (ErrorFunction(cp) - ErrorFunction(cm))/(2*h(i_v));
    end
    
    curr_pars = curr_pars - gamma*gradErrorFunction
    [err] = ErrorFunction(curr_pars) 
    
end
toc

I was wondering if setting gamma = 2*h makes things simpler as they cancel out each other in the line

curr_pars = curr_pars - gamma*gradErrorFunction

which makes the updated parameter array solely depend on h (from the difference of error functions).

Do you think the current stepsize and learning rate are reasonable or are they too high? I observed that towards the end of 100 iterations, the error stopped reducing and started increasing. But if I reduce those two variables, the error didn't even tend to decrease in the first place. I just played around with various combinations but didn't find a way to optimize them.

Again, thanks a lot for taking the time to answer my query.

Sign in to comment.