How to pretrain a stochastic actor network for PPO training?

Question

Jan Dewez on 6 May 2021

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/823380-how-to-pretrain-a-stochastic-actor-network-for-ppo-training

Commented: Anh Tran on 17 May 2021

I want to create a stochastic actor network that outputs an action array of 10 values between 0 and 1 given an observation array of 28 normalized values. I specified upper and lower limits as follows to ensure the actor's output to be between 0 and 1:

ActionInfo = rlNumericSpec([numActions 1],'LowerLimit',[0;0;0;0;0;0;0;0;0;0],'UpperLimit',[1;1;1;1;1;1;1;1;1;1]);

My stochastic network looks as follows:

I have created a normalized training data set (input dimension 28, target dimension 10). How do I use this data set to pretrain above network?

Clarification: I want to train the network before starting the PPO agent training.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Anh Tran on 13 May 2021

1
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/823380-how-to-pretrain-a-stochastic-actor-network-for-ppo-training#answer_699308

Hi Jan,

You can pretrain a stochastic actor with Deep Learning Toolbox's trainNetwork with some additional work. Emmanouil gave some good pointers initially but I want to add those steps:

You need a custom loss layer since the stochastic actor network outputs mean and standard deviations, while your target is action. You can try maximum log likelihood loss. You can follow the instruction here to create a custom loss layer (you don't have to implement backward pass as autodifferentiation will take care of it)

% We want to maximize objective of log f(x) where f(x) is the probability density function follows Normal(mean, sigma)

% Loss = -Objective = - log(f(x)) = 1/2*log(2*pi) + log(sigma) + 1/2*((x-mu)/sigma)^2;

Keep in mind that you must protect against log(0), adding eps is sufficient. x is your action target.

4 Comments
Show 2 older commentsHide 2 older comments

Jan Dewez on 13 May 2021

Open in MATLAB Online

Hi Anh,

So by replacing the regression layer in my initial example with the custom regression layer myRegressionLayer I should be able to train the network with my initial data set? Variable Y in the code below contains means and standard devations because that is what the actor network outputs. In other words, the input dimension of the custom regression layer is 20? Because If I try to train the network I get an error (see below).

%% pretrain stochastic actor NN
classdef myRegressionLayer < nnet.layer.RegressionLayer
        
    properties
        % (Optional) Layer properties.
        % Layer properties go here.
    end
 
    methods
        function layer = myRegressionLayer(name)           
            % (Optional) Create a myRegressionLayer.
            layer.Name = name;
            layer.Description = 'maximum log likelihood loss';
            % Layer constructor function goes here.
        end
        function loss = forwardLoss(layer, Y, T)
            % Return the loss between the predictions Y and the training
            % targets T.
            %
            % Inputs:
            %         layer - Output layer
            %         Y     – Predictions made by network
            %         T     – Training targets
            %
            % Output:
            %         loss  - Loss between Y and T
            % Layer forward loss function goes here.
            numActions = length(Y)/2;
            mu = zeros(numActions);
            sigma = zeros(numActions);            
            for i = 1:numActions
                mu(i) = Y(i);
                sigma(i) = Y(i+numActions);                
            end      
            
            loss = 0.5*log(2*pi)+log(sigma+eps)+0.5*((T-mu)./(sigma+eps)).^2;
        end
    end
end

Error using trainNetwork (line 183)
Invalid training data. The output size (20) of the last layer does not
match the number of responses (10).
Error in pretraining (line 35)
net = trainNetwork(PtrainArray2,TtrainArray2,net_actor,options);

Below the stochastic network:

inPath = [featureInputLayer(numObs, 'Normalization','none','Name','myobs')  %numObs = 28
          fullyConnectedLayer(380,'Name','hidden1')
          reluLayer('Name','relu1')
          fullyConnectedLayer(195,'Name','hidden2')
          reluLayer('Name','relu2')
          fullyConnectedLayer(100,'Name','hidden3')
          reluLayer('Name','relu3')]; 
% path layers for mean value (10 by 1 input and output
meanPath = [fullyConnectedLayer(numActions,'Name','means')                  %numActions = 10
            sigmoidLayer('Name','sigmoid')
            scalingLayer('Name','scale','Scale',ActionInfo.UpperLimit,'Bias',(ActionInfo.UpperLimit-ActionInfo.LowerLimit)/2)]; 
            
% path layers for standard deviations (10 by 1 input and output)
% using softplus layer to make it non negative
sdevPath =  [fullyConnectedLayer(numActions,'Name','sdevs')
            softplusLayer('Name', 'splus')];
% concatenate two inputs (along dimension #1) to form a single (20 by 1) output layer
outLayer = [concatenationLayer(1,2,'Name','mean&sdev')
            myRegressionLayer('actions')];
% add layers to network object
net_actor = layerGraph(inPath);
net_actor = addLayers(net_actor,meanPath);
net_actor = addLayers(net_actor,sdevPath);
net_actor = addLayers(net_actor,outLayer);
% connect layers: the mean value path output MUST be connected to the FIRST input of the concatenationLayer
net_actor = connectLayers(net_actor,'relu3','means/in');             % connect output of inPath to meanPath input
net_actor = connectLayers(net_actor,'relu3','sdevs/in');             % connect output of inPath to sdevPath input
net_actor = connectLayers(net_actor,'scale','mean&sdev/in1');        % connect output of meanPath to conc layer input #1
net_actor = connectLayers(net_actor,'splus','mean&sdev/in2');  

Jan Dewez on 15 May 2021

Open in MATLAB Online

I rewrote my custom regression class like this:

classdef myRegressionLayer < nnet.layer.RegressionLayer
        
    methods
        function layer = myRegressionLayer()           
            % (Optional) Create a myRegressionLayer.
            layer.Name = name;
            layer.Description = 'maximum log likelihood loss';
            % Layer constructor function goes here.
        end
        function loss = forwardLoss(layer, Y, T)
            % Return the loss between the predictions Y and the training
            % targets T.
            %
            % Inputs:
            %         layer - Output layer
            %         Y     – Predictions made by network (20 x minibatchsize)
            %         T     – Training targets (20 x mminibatchsize)
            %
            % Output:
            %         loss  - Loss between Y and T
            numActions = height(Y)/2;
            mu = Y(1:numActions,:);         %(10 x minibatchsize)
            sigma = Y(numActions+1:end,:);  %(10 x minibatchsize)
            for i = 1:numActions
                loss(i,:) = 0.5*log(2*pi) + log(sigma(i,:)+eps) + 0.5*((T(i,:)-mu(i,:))./(sigma(i,:)+eps)).^2;
            end
            disp('loss: ');
            disp(loss);
        end        
    end
end

When I for example set MiniBatchSize to 5, loss looks like this:

loss: 
  10×5 single dlarray
7065    0.7062    0.7346    0.7249    0.6832
0642    1.0203    1.0669    1.0500    1.0539
7998    1.0349    1.3149    1.2599    0.8729
5574    1.5650    1.5613    1.6017    1.5787
2772    1.1369    1.5798    1.4769    1.2660
7744    0.7541    0.7840    0.7776    0.7427
8501    0.8206    0.8311    0.8372    0.8288
7570    0.7704    0.7467    0.8035    0.7890
7789    0.7916    0.7898    0.7881    0.8122
7692    0.7411    0.7553    0.7528    0.7689

Followed by this error:

Error using trainNetwork (line 183)
Error using 'backwardLoss' in Layer myRegressionLayer. The function threw an error and
could not be executed.
Error in pretraining (line 42)
net = trainNetwork(PtrainArray2,TtrainArray2_ext,net_actor,options);
Caused by:
    Error using dlarray/dlgradient (line 51)
    Value to differentiate must be a traced dlarray scalar.

I am not sure how to fix this. What should 'loss' look like?

Anh Tran on 17 May 2021

Open in MATLAB Online

As mentioned from the error message, value to differentiate must be a scalar. Thus, you need to compute mean of the loss over each batch. Also, I am not sure why you need a for-loop to compute loss. We can vectorize the computation as followed (since sigma, T, mu have same size)

% vectorize loss computation
loss = 0.5*log(2*pi) + log(sigma + eps) + 0.5*((T-mu)./(sigma+eps)).^2;
% mean of the loss over each batch
loss = sum(loss,'all');
loss = loss/batchSize;

Sign in to comment.

Answer 2

Emmanouil Tzorakoleftherakis on 13 May 2021

1
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/823380-how-to-pretrain-a-stochastic-actor-network-for-ppo-training#answer_698948

Hello,

Since you already have a dataset, you will have to use Deep Learning Toolbox to get your initial policy. Take a look at the examples below to get an idea:

https://www.mathworks.com/help/reinforcement-learning/ug/imitate-mpc-controller-for-lane-keeping-assist.html

https://www.mathworks.com/help/reinforcement-learning/ug/imitate-nonlinear-mpc-controller-for-flying-robot.html

1 Comment
Show -1 older commentsHide -1 older comments

Jan Dewez on 13 May 2021

Hello Emmanouil,

Thanks for the response, but how do I train a stochastic actor with output dimension 20 when my train data has dimension 10? Do I need to convert my train set in such a way that I obtain means & st. devs?

Sign in to comment.

How to pretrain a stochastic actor network for PPO training?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

4 Comments
Show 2 older commentsHide 2 older comments

More Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

How to pretrain a stochastic actor network for PPO training?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

4 Comments Show 2 older commentsHide 2 older comments

More Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments

1 Comment
Show -1 older commentsHide -1 older comments