Main Content

rlContinuousGaussianActor

Stochastic Gaussian actor with a continuous action space for reinforcement learning agents

Description

This object implements a function approximator to be used as a stochastic actor within a reinforcement learning agent with a continuous action space. A continuous Gaussian actor takes an environment state as input and returns as output a random action sampled from a Gaussian probability distribution of the expected cumulative long term reward, thereby implementing a stochastic policy. After you create an rlContinuousGaussianActor object, use it to create a suitable agent, such as an rlACAgent or rlPGAgent agent. For more information on creating representations, see Create Policies and Value Functions.

Creation

Description

actor = rlContinuousGaussianActor(net,observationInfo,actionInfo,ActionMeanOutputNames=netMeanActName,ActionStandardDeviationOutputNames=netStdvActName) creates a Gaussian stochastic actor with a continuous action space using the deep neural network net as function approximator. Here, net must have two differently named output layers each with as many elements as the number of dimensions of the action space, as specified in actionInfo. The two output layers calculate the mean and standard deviation of each component of the action. The actor uses these layers, according to the names specified in the strings netMeanActName and netStdActName, to represent the Gaussian probability distribution from which the action is sampled. The function sets the ObservationInfo and ActionInfo properties of actor to the input arguments observationInfo and actionInfo, respectively.

Note

actor does not enforce constraints set by the action specification, therefore, when using this actor, you must enforce action space constraints within the environment.

example

actor = rlContinuousGaussianActor(net,observationInfo,actionInfo,ActionMeanOutputNames=netMeanActName,ActionStandardDeviationOutputNames=netStdActName,ObservationInputNames=netObsNames) specifies the names of the network input layers to be associated with the environment observation channels. The function assigns, in sequential order, each environment observation channel specified in observationInfo to the layer specified by the corresponding name in the string array netObsNames. Therefore, the network input layers, ordered as the names in netObsNames, must have the same data type and dimensions as the observation specifications, as ordered in observationInfo.

actor = rlContinuousGaussianActor(___,UseDevice=useDevice) specifies the device used to perform computational operations on the actor object, and sets the UseDevice property of actor to the useDevice input argument. You can use this syntax with any of the previous input-argument combinations.

Input Arguments

expand all

Deep neural network used as the underlying approximator within the actor. The network must have two differently named output layers each with as many elements as the number of dimensions of the action space, as specified in actionInfo. The two output layers calculate the mean and standard deviation of each component of the action. The actor uses these layers, according to the names specified in the strings netMeanActName and netStdActName, to represent the Gaussian probability distribution from which the action is sampled.

Note

Standard deviations must be nonnegative and mean values must fall within the range of the action. Therefore the output layer that returns the standard deviations must be a softplus or ReLU layer, to enforce nonnegativity, and the output layer that returns the mean values must be a scaling layer, to scale the mean values to the output range.

You can specify the network as one of the following:

Note

Among the different network representation options, dlnetwork is preferred, since it has built-in validation checks and supports automatic differentiation. If you pass another network object as an input argument, it is internally converted to a dlnetwork object. However, best practice is to convert other representations to dlnetwork explicitly before using it to create a critic or an actor for a reinforcement learning agent. You can do so using dlnet=dlnetwork(net), where net is any neural network object from the Deep Learning Toolbox™. The resulting dlnet is the dlnetwork object that you use for your critic or actor. This practice allows a greater level of insight and control for cases in which the conversion is not straightforward and might require additional specifications.

rlContinuousGaussianActor objects support recurrent deep neural networks.

The learnable parameters of the actor are the weights of the deep neural network. For a list of deep neural network layers, see List of Deep Learning Layers. For more information on creating deep neural networks for reinforcement learning, see Create Policies and Value Functions.

Names of the network output layers corresponding to the mean values of the action channel, specified as a string or character vector. The actor uses this name to select the network output layer that returns the mean values of each elements of the action channel. Therefore, this network output layer must be named as indicated in netMeanActName. Furthermore, it must be a scaling layer that scales the returned mean values to the desired action range.

Note

Of the information specified in actionInfo, the function uses only the data type and dimension of each channel, but not its (optional) name or description.

Example: 'myNetOut_Force_Mean_Values'

Names of the network output layers corresponding to the standard deviations of the action channel, specified as a string or character vector. The actor uses this name to select the network output layer that returns the standard deviations of each elements of the action channel. Therefore, this network output layer must be named as indicated in netStdvActName. Furthermore, it must be a softplus or ReLU layer, to enforce nonnegativity of the returned standard deviations.

Note

Of the information specified in actionInfo, the function uses only the data type and dimension of each channel, but not its (optional) name or description.

Example: 'myNetOut_Force_Standard_Deviations'

Network input layers names corresponding to the environment observation channels, specified as a string array or a cell array of character vectors. When you use the pair value arguments 'ObservationInputNames' and netObsNames, the function assigns, in sequential order, each environment observation channel specified in observationInfo to each network input layer specified by the corresponding name in the string array netObsNames. Therefore, the network input layers, ordered as the names in netObsNames, must have the same data type and dimensions as the observation specifications, as ordered in observationInfo.

Note

Of the information specified in observationInfo, the function uses only the data type and dimension of each channel, but not its (optional) name or description.

Example: {"NetInput1_airspeed","NetInput2_altitude"}

Properties

expand all

Observation specifications, specified as an rlFiniteSetSpec or rlNumericSpec object or an array of such objects. These objects define properties such as the dimensions, data types, and names of the observation signals.

rlContinuousGaussianActor sets the ObservationInfo property of actor to the input observationInfo.

You can extract ObservationInfo from an existing environment or agent using getObservationInfo. You can also construct the specifications manually.

Action specifications, specified as an rlFiniteSetSpec object. This object defines the properties of the environment action channel, such as its dimensions, data type, and name. Note that the function does not use the name of the action channel specified in actionInfo.

Note

Only one action channel is allowed.

rlContinuousGaussianActor sets the ActionInfo property of critic to the input actionInfo.

You can extract ActionInfo from an existing environment or agent using getActionInfo. You can also construct the specifications manually.

Computation device used to perform operations such as gradient computation, parameter update and prediction during training and simulation, specified as either "cpu" or "gpu".

The "gpu" option requires both Parallel Computing Toolbox™ software and a CUDA® enabled NVIDIA® GPU. For more information on supported GPUs see GPU Support by Release (Parallel Computing Toolbox).

You can use gpuDevice (Parallel Computing Toolbox) to query or select a local GPU device to be used with MATLAB®.

Note

Training or simulating an agent on a GPU involves device-specific numerical round-off errors. These errors can produce different results compared to performing the same operations a CPU.

To speed up training by using parallel processing over multiple cores, you do not need to use this argument. Instead, when training your agent, use an rlTrainingOptions object in which the UseParallel option is set to true. For more information about training using multicore processors and GPUs for training, see Train Agents Using Parallel Computing and GPUs.

Example: 'UseDevice',"gpu"

Object Functions

rlACAgentActor-critic reinforcement learning agent
rlPGAgentPolicy gradient reinforcement learning agent
rlPPOAgentProximal policy optimization reinforcement learning agent
rlSACAgentSoft actor-critic reinforcement learning agent
getActionObtain action from agent or actor given environment observations

Examples

collapse all

Create an observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous six-dimensional space, so that a single observation is a column vector containing 5 doubles.

obsInfo = rlNumericSpec([5 1]);

Create an action specification object (or alternatively use getActionInfo to extract the specification object from an environment). For this example, define the action space as a continuous three-dimensional space, so that a single action is a column vector containing 3 doubles each between -10 and 10.

actInfo = rlNumericSpec([3 1], ...
    'LowerLimit',-10,'UpperLimit',10);

Create a deep neural network to be used as approximation model within the actor. For a continuous Gaussian actor, the network must take the observation signal as input and return both a mean value and a standard deviation value for each action. Therefore it must have two output layers (one for the mean values the other for the standard deviation values), each having as many elements as the dimension of the action space.

Note that standard deviations must be nonnegative and mean values must fall within the range of the action. Therefore the output layer that returns the standard deviations must be a softplus or ReLU layer, to enforce nonnegativity, while the output layer that returns the mean values must be a scaling layer, to scale the mean values to the output range.

% input path layers
inPath = [ featureInputLayer(prod(obsInfo.Dimension), ...
              'Normalization','none','Name','netObsIn')
           fullyConnectedLayer(prod(actInfo.Dimension), ...
              'Name','infc') ];

% path layers for mean value 
% using scalingLayer to scale range from (-1,1) to (-10,10)
meanPath = [ tanhLayer('Name','tanhMean');
             fullyConnectedLayer(prod(actInfo.Dimension));
             scalingLayer('Name','scale', ...
                'Scale',actInfo.UpperLimit) ];

% path layers for standard deviations
% using softplus layer to make them non negative
sdevPath = [ tanhLayer('Name','tanhStdv');
             fullyConnectedLayer(prod(actInfo.Dimension));
             softplusLayer('Name','splus') ];

% add layers to network object
net = layerGraph(inPath);
net = addLayers(net,meanPath);
net = addLayers(net,sdevPath);

% connect layers
net = connectLayers(net,'infc','tanhMean/in');
net = connectLayers(net,'infc','tanhStdv/in');

% plot network
plot(net)

Figure contains an axes object. The axes object contains an object of type graphplot.

Create the actor with rlContinuousGaussianActor, using the network, the observations and action specification objects, the names of the network input layer and the options object.

actor = rlContinuousGaussianActor(net, obsInfo, actInfo, ...
    'ActionMeanOutputNames','scale',...
    'ActionStandardDeviationOutputNames','splus',...
    'ObservationInputNames','netObsIn');

To check your actor, use getAction to return an action from a random observation vector, using the current network weights. Each of the three elements of the action vector is a random sample from the Gaussian distribution with mean and standard deviation calculated, as a function of the current observation, by the neural network.

act = getAction(actor,{rand(obsInfo.Dimension)}); 
act{1}
ans = 3x1 single column vector

  -12.0285
    1.7628
   10.8733

To return the Gaussian distribution of the action, given an observation, use evaluate.

dist = evaluate(actor,{rand(obsInfo.Dimension)});

Display the vector of mean values.

dist{1}
ans = 3x1 single column vector

   -5.6127
    3.9449
    9.6213

Display the vector of standard deviations.

dist{2}
ans = 3x1 single column vector

    0.8516
    0.8366
    0.7004

You can now use the actor to create a suitable agent (such as an rlACAgent, rlPGAgent, or rlPPOAgent agent).

Version History

Introduced in R2022a