How we can use vectors in Deep Learning custom training loop?

2 views (last 30 days)
Hi every one.
I am trying to train a CNN with my own optimizer through costum training loop.
[loss,gradient]= dlfeval(@modelgradient,dlnet, Xtrian,YTrain)
myFun = @(dlnet,gradient,loss)myOptimizer(dlnet,gradient,loss,...)
dlnet = dlupdate(myFun,dlnet,gradient,loss)
My optimizer needs w (current parameter vector), g (its corresponding gradient vector), f (its corresponding loss value) and… as inputs. This optimizer needs many computations with w, g, f inside to give w = w + p, p is a optimal vector that my optimizer has to compute it by which I can update my w.
I need something by which I can convert the parameters and gradient of dl format to vectors for those computations inside of my optimizer, then to use above syntax I need to convert vector to dl formats required in loop and in my optimizer as well. This back and forth is necessary for my job for using training loop. Can you help to find functions in the toolbox to do these jobs (vector to table (because gradient and dlnet’s parameters are tables with dlarray cells) and vice versa), or any other solutions?

Answers (1)

Amanpreetsingh Arora
Amanpreetsingh Arora on 12 Nov 2020
Weight parameters of the dlnetwork object can be accessed from the "Learnables" property of the object. Both the "gradient" and this "Learnables" property will return a table with variables "Layer", "Parameter" and "Value". You can access the weight parameter and its corresponding gradient by indexing into the table in a loop as follows.
for i=1:size(dlnet.Learnables,1)
w=dlnet.Learnables{i,"Value"}{1,1};
layerName = dlnet.Learnables{i,"Layer"};
paramName = dlnet.Learnables{i,"Parameter"};
g=gradient{gradient.Layer==layerName & gradient.Parameter==paramName,"Value"}{1,1}
end
For more information on indexing into table, refer to the following documentation.
"w" and "g" will be dlarray objects in the above code snippet. Converting this to double array might be unnecessary as many operations/functions that support double array also support dlarray objects. Refer to the following documentation for a list of functions that support dlarray.
  3 Comments
Amanpreetsingh Arora
Amanpreetsingh Arora on 12 Nov 2020
Edited: Amanpreetsingh Arora on 12 Nov 2020
"dlupdate" will call "myOptimizer" with weights and gradient of each layer individually. So, the inputs to "myOptimizer" (and even "sgdFunction" in the example) are of type dlarray and "myOptimizer" will be called several times in one iteration with each layer's parameters. Inside "myOptimizer", you won't need table indexing if it is called using "dlupdate". Refer to the following documentation for more information on "dlupdate".
The recommended approach for working with dlarray, as mentioned in the answer, is to perform direct operation on it and not convert to double array. However, if you convert dlarray to double array using "extractdata", you can convert the results of the computation back to dlarray by passing it to "dlarray" function.
For example, to convert a double array X
dlX=dlarray(X);
Refer to the following documentation to know more about "dlarray".
MAHSA YOUSEFI
MAHSA YOUSEFI on 12 Nov 2020
Having only dlarray is not all my require. I need (gradient and parameter) matrices per each layer. I have a vector such as g or w or p that I want to convert them to matrices for each layer.
For instance in conv1, I have 20 filters with 5*5 sizes and so on up to Fc layers. I have an unrolled vector of these matrices for all layers. Now I want to comeback from this vector to these matrices per each layer. For sure to use dlnet I have to have these matrices per each layer. Why I convert gradient (table) to vector and I want to come bach one more time is in bellow:
This code is something like that exist in "dlupdate" page.
for epoch = 1:numEpochs
% Shuffle data...
for i = 1:maxit
iteration = iteration + 1;
% Read mini-batch
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X = XTrain(:,:,:,idx);
Y = zeros(numClasses, miniBatchSize, 'single');
for c = 1:numClasses
Y(c,YTrain(idx)==classes(c)) = 1;
end
dlX = dlarray(single(X),'SSCB');
if (executionEnvironment == "auto" && canUseGPU) || executionEnvironment == "gpu"
dlX = gpuArray(dlX);
end
[gradients,loss] = dlfeval(@modelGradients,dlnet,dlX,Y);
updateFcn = @(dlnet,gradients) MyFunction(dlnet,gradients,learnRate);
dlnet = dlupdate(updateFcn,dlnet,gradients);
...
end
end
%***********************************************************
function [gradients,loss] = modelGradients(dlnet,dlX,Y)
dlYPred = forward(dlnet,dlX);
loss = crossentropy(dlYPred,Y);
gradients = dlgradient(loss,dlnet.Learnables);
end
function dlnet = MyFunction(???????)
?????????
end
I do not have any idea to MyFunction because ss you see in my optimizer, in step (1), I need gradient and its norm. For varifying condition in step (3), I should know about the loss and gradient vector at "candidate" parameter w_cand through step (2) ((it is NOT "update" parameter)). This is the framework of my optimizer in vector setting.
function w_update = Optimizer(w, f, g, alpha,c)
% w: vector of parameters
% g: vector of gradients at w
% f: loss at w
% w_cand: vector of candidate parameters
% g_cand: vector of gradients at w_cand
% f_cand: loss at w_cand
%(1)
norm_g = norm(g);
p = Func_to_compute_p(g,norm_g);
w_cand = w + alpha*p;
%(2)
%(((((((((((((((((((((((((((((((((((((((((((((((((((
% compute loss f_cand and gradient g_cand evaluated at w_cand
%)))))))))))))))))))))))))))))))))))))))))))))))))))
%(3)
while alpha > tol
if f_cand > f + c*alpha*(g_cand'*p)
p = alpha*p;
w_update = w + p;
break
else
alpha = alpha/2;
end
end
w_update = w + p;
I know "dlnet" initialized with Xavier method. Lets say "w" is an unrolled vector of parameter matrices at each layer. loss and gradients in
[gradients,loss] = dlfeval(@modelGradients,dlnet,dlX,Y);
are corresponding with "dlnet". So, to work with my optimizer I can convert loss and gradients to have f and g corresponding with w through function "set2vector". In this way I cannot take warning about operation support. But for step(2), I need "dlnet_cand" and thus "gradients_cand" and "loss_cand". I think I have to write this code at step(2):
[gradients_cand,loss_cand] = dlfeval(@modelGradients,dlnet_cand,dlX,Y);
By this thought, now, I have a vector p by which I can not update dlnet_cand = dlnet + p because p is a vector. To follow as sgdFunction(( where dlnet = dlnet - gradients.*lr)) I need to convert vector p to matrices per each layer (having a table).
For step 3 one more time I have to convert loss_cand and gradients_cand to f_cand and vector g_cand.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!