Main Content

rlSACAgent

Soft actor-critic reinforcement learning agent

Description

The soft actor-critic (SAC) algorithm is a model-free, online, off-policy, actor-critic reinforcement learning method. The SAC algorithm computes an optimal policy that maximizes both the long-term expected reward and the entropy of the policy. The policy entropy is a measure of policy uncertainty given the state. A higher entropy value promotes more exploration. Maximizing both the reward and the entropy balances exploration and exploitation of the environment.

For more information, see Soft Actor-Critic Agents.

For more information on the different types of reinforcement learning agents, see Reinforcement Learning Agents.

Creation

Description

Create Agent from Observation and Action Specifications

example

agent = rlSACAgent(observationInfo,actionInfo) creates a SAC agent for an environment with the given observation and action specifications. (observationInfo) and action specifications (actionInfo). The actor and critic representations in the agent use default deep neural networks built using the observation specification observationInfo and action specification actionInfo.

example

agent = rlSACAgent(observationInfo,actionInfo,initOptions) creates a SAC agent with deep neural network representations configured using the specified initialization options (initOptions).

Create Agent from Actor and Critic Representations

example

agent = rlSACAgent(actor,critics) creates a SAC agent with the specified actor and critic networks and default agent options.

Specify Agent Options

agent = rlSACAgent(___,agentOptions) sets the AgentOptions property for any of the previous syntaxes.

Input Arguments

expand all

Observation specifications, specified as a reinforcement learning specification object or an array of specification objects defining properties such as dimensions, data type, and names of the observation signals.

You can extract observationInfo from an existing environment or agent using getObservationInfo. You can also construct the specifications manually using rlFiniteSetSpec or rlNumericSpec.

Action specification for a continuous action space, specified as an rlNumericSpec object defining properties such as dimensions, data type and name of the action signals.

You can extract actionInfo from an existing environment or agent using getActionInfo. You can also construct the specification manually using rlFiniteSetSpec or rlNumericSpec.

Representation initialization options, specified as an rlAgentInitializationOptions object.

Actor network representation, specified as an rlStochasticActorRepresentation object.

For more information on creating actor representations, see Create Policy and Value Function Representations.

Critic network representations, specified as one of the following:

  • rlQValueRepresentation object — Create a SAC agent with a single Q-value function.

  • Two-element row vector of rlQValueRepresentation objects — Create a SAC agent with two critic value functions. The two critic networks must be unique rlQValueRepresentation objects with the same observation and action specifications. The representations can either have different structures or the same structure but with different initial parameters.

For a SAC agent, each critic must be a single-output rlQValueRepresentation object that takes both the action and observations as inputs.

For more information on creating critic representations, see Create Policy and Value Function Representations.

Properties

expand all

Agent options, specified as an rlSACAgentOptions object.

Experience buffer, specified as an ExperienceBuffer object. During training the agent stores each of its experiences (S,A,R,S') in a buffer. Here:

  • S is the current observation of the environment.

  • A is the action taken by the agent.

  • R is the reward for taking action A.

  • S' is the next observation after taking action A.

For more information on how the agent samples experience from the buffer during training, see Soft Actor-Critic Agents.

Object Functions

trainTrain reinforcement learning agents within a specified environment
simSimulate trained reinforcement learning agents within specified environment
getActionObtain action from agent or actor representation given environment observations
getActorGet actor representation from reinforcement learning agent
setActorSet actor representation of reinforcement learning agent
getCriticGet critic representation from reinforcement learning agent
setCriticSet critic representation of reinforcement learning agent
generatePolicyFunctionCreate function that evaluates trained policy of reinforcement learning agent

Examples

collapse all

Create environment and obtain observation and action specifications. For this example, load the environment used in the example Train DDPG Agent to Control Double Integrator System. The observations from the environment is a vector containing the position and velocity of a mass. The action is a scalar representing a force, applied to the mass, ranging continuously from -2 to 2 Newton.

env = rlPredefinedEnv("DoubleIntegrator-Continuous");
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

The agent creation function initializes the actor and critic networks randomly. You can ensure reproducibility by fixing the seed of the random generator. To do so, uncomment the following line.

% rng(0)

Create a SAC agent from the environment observation and action specifications.

agent = rlSACAgent(obsInfo,actInfo);

To check your agent, use getAction to return the action from a random observation.

getAction(agent,{rand(obsInfo(1).Dimension)})
ans = 1x1 cell array
    {[0.0546]}

You can now test and train the agent within the environment.

Create an environment with a continuous action space and obtain its observation and action specifications. For this example, load the environment used in the example Train DDPG Agent to Control Double Integrator System. The observation from the environment is a vector containing the position and velocity of a mass. The action is a scalar representing a force, applied to the mass, ranging continuously from -2 to 2 Newton.

env = rlPredefinedEnv("DoubleIntegrator-Continuous");
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

Create an agent initialization option object, specifying that each hidden fully connected layer in the network must have 128 neurons.

initOpts = rlAgentInitializationOptions('NumHiddenUnit',128);

The agent creation function initializes the actor and critic networks randomly. You can ensure reproducibility by fixing the seed of the random generator. To do so, uncomment the following line.

% rng(0)

Create a SAC agent from the environment observation and action specifications using the initialization options.

agent = rlSACAgent(obsInfo,actInfo,initOpts);

Extract the deep neural network from the actor.

actorNet = getModel(getActor(agent));

Extract the deep neural networks from the two critics. Note that getModel(critics) only returns the first critic network.

critics = getCritic(agent);
criticNet1 = getModel(critics(1));
criticNet2 = getModel(critics(2));

Display the layers of the first critic network, and verify that each hidden fully connected layer has 128 neurons.

criticNet1.Layers
ans = 
  10x1 Layer array with layers:

     1   'concat'               Concatenation       Concatenation of 2 inputs along dimension 1
     2   'relu_body'            ReLU                ReLU
     3   'fc_body'              Fully Connected     128 fully connected layer
     4   'body_output'          ReLU                ReLU
     5   'input_1'              Feature Input       2 features
     6   'fc_1'                 Fully Connected     128 fully connected layer
     7   'input_2'              Feature Input       1 features
     8   'fc_2'                 Fully Connected     128 fully connected layer
     9   'output'               Fully Connected     1 fully connected layer
    10   'RepresentationLoss'   Regression Output   mean-squared-error

Plot the networks of the actor and of the second critic.

plot(actorNet)

plot(criticNet2)

To check your agent, use getAction to return the action from a random observation.

getAction(agent,{rand(obsInfo(1).Dimension)})
ans = 1x1 cell array
    {[-0.9867]}

You can now test and train the agent within the environment.

Create an environment and obtain observation and action specifications. For this example, load the environment used in the example Train DDPG Agent to Control Double Integrator System. The observations from the environment is a vector containing the position and velocity of a mass. The action is a scalar representing a force, applied to the mass, ranging continuously from -2 to 2 Newton.

env = rlPredefinedEnv("DoubleIntegrator-Continuous");
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

Obtain the number of observations and the number of actions.

numObs = obsInfo.Dimension(1);
numAct = numel(actInfo);

Create two Q-value critic representations. First, create a critic deep neural network structure.

statePath1 = [
    featureInputLayer(numObs,'Normalization','none','Name','observation')
    fullyConnectedLayer(400,'Name','CriticStateFC1')
    reluLayer('Name','CriticStateRelu1')
    fullyConnectedLayer(300,'Name','CriticStateFC2')
    ];
actionPath1 = [
    featureInputLayer(numAct,'Normalization','none','Name','action')
    fullyConnectedLayer(300,'Name','CriticActionFC1')
    ];
commonPath1 = [
    additionLayer(2,'Name','add')
    reluLayer('Name','CriticCommonRelu1')
    fullyConnectedLayer(1,'Name','CriticOutput')
    ];

criticNet = layerGraph(statePath1);
criticNet = addLayers(criticNet,actionPath1);
criticNet = addLayers(criticNet,commonPath1);
criticNet = connectLayers(criticNet,'CriticStateFC2','add/in1');
criticNet = connectLayers(criticNet,'CriticActionFC1','add/in2');

Critic the critic representations. Use the same network structure for both critics. The SAC agent initializes the two networks using different default parameters.

criticOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-3,... 
                                        'GradientThreshold',1,'L2RegularizationFactor',2e-4);
critic1 = rlQValueRepresentation(criticNet,obsInfo,actInfo,...
    'Observation',{'observation'},'Action',{'action'},criticOptions);
critic2 = rlQValueRepresentation(criticNet,obsInfo,actInfo,...
    'Observation',{'observation'},'Action',{'action'},criticOptions);

Create an actor deep neural network. Do not add a tanhLayer or scalingLayer in the mean output path. The SAC agent internally transforms the unbounded Gaussian distribution to the bounded distribution to compute the probability density function and entropy properly.

statePath = [
    featureInputLayer(numObs,'Normalization','none','Name','observation')
    fullyConnectedLayer(400, 'Name','commonFC1')
    reluLayer('Name','CommonRelu')];
meanPath = [
    fullyConnectedLayer(300,'Name','MeanFC1')
    reluLayer('Name','MeanRelu')
    fullyConnectedLayer(numAct,'Name','Mean')
    ];
stdPath = [
    fullyConnectedLayer(300,'Name','StdFC1')
    reluLayer('Name','StdRelu')
    fullyConnectedLayer(numAct,'Name','StdFC2')
    softplusLayer('Name','StandardDeviation')];

concatPath = concatenationLayer(1,2,'Name','GaussianParameters');

actorNetwork = layerGraph(statePath);
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,stdPath);
actorNetwork = addLayers(actorNetwork,concatPath);
actorNetwork = connectLayers(actorNetwork,'CommonRelu','MeanFC1/in');
actorNetwork = connectLayers(actorNetwork,'CommonRelu','StdFC1/in');
actorNetwork = connectLayers(actorNetwork,'Mean','GaussianParameters/in1');
actorNetwork = connectLayers(actorNetwork,'StandardDeviation','GaussianParameters/in2');

Create a stochastic actor representation using the deep neural network.

actorOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-3,...
                                       'GradientThreshold',1,'L2RegularizationFactor',1e-5);

actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,actorOptions,...
    'Observation',{'observation'});

Specify agent options.

agentOptions = rlSACAgentOptions;
agentOptions.SampleTime = env.Ts;
agentOptions.DiscountFactor = 0.99;
agentOptions.TargetSmoothFactor = 1e-3;
agentOptions.ExperienceBufferLength = 1e6;
agentOptions.MiniBatchSize = 32;

Create SAC agent using actor, critics, and options.

agent = rlSACAgent(actor,[critic1 critic2],agentOptions);

To check your agent, use getAction to return the action from a random observation.

getAction(agent,{rand(obsInfo(1).Dimension)})
ans = 1x1 cell array
    {[0.1259]}

You can now test and train the agent within the environment.

Introduced in R2019a