evaluate

Evaluate function approximator object given observation (or observation-action) input data

Since R2022a

Syntax

outData = evaluate(fcnAppx,inData)

[outData,nextState] = evaluate(fcnAppx,inData)

___ = evaluate(___,UseForward=useForward)

Description

outData = evaluate(fcnAppx,inData) evaluates the function approximator object (that is, the actor or critic) fcnAppx given the input value inData. It returns the output value outData.

example

[outData,nextState] = evaluate(fcnAppx,inData) also returns the updated state of fcnAppx when it contains a recurrent neural network.

___ = evaluate(___,UseForward=useForward) allows you to explicitly call a forward pass when computing gradients.

Examples

collapse all

Evaluate a Function Approximator Object

Open Live Script

This example shows you how to evaluate a function approximator object (that is, an actor or a critic). For this example, the function approximator object is a discrete categorical actor and you evaluate it given some observation data, obtaining in return the action probability distribution and the updated network state.

Load the same environment used in Train PG Agent to Balance Cart-Pole System, and obtain the observation and action specifications.

env = rlPredefinedEnv("CartPole-Discrete");
obsInfo = getObservationInfo(env)

obsInfo = 
  rlNumericSpec with properties:

     LowerLimit: -Inf
     UpperLimit: Inf
           Name: "CartPole States"
    Description: "x, dx, theta, dtheta"
      Dimension: [4 1]
       DataType: "double"

actInfo = getActionInfo(env)

actInfo = 
  rlFiniteSetSpec with properties:

       Elements: [-10 10]
           Name: "CartPole Action"
    Description: [0×0 string]
      Dimension: [1 1]
       DataType: "double"

To approximate the policy within the actor, use a recurrent deep neural network. Define the network as an array of layer objects. Get the dimensions of the observation space and the number of possible actions directly from the environment specification objects.

net = [
    sequenceInputLayer(prod(obsInfo.Dimension))
    fullyConnectedLayer(8)
    reluLayer
    lstmLayer(8,OutputMode="sequence")
    fullyConnectedLayer(numel(actInfo.Elements)) 
    ];

Convert the network to a dlnetwork object and display the number of weights.

net = dlnetwork(net);
summary(net)

   Initialized: true

   Number of learnables: 602

   Inputs:
      1   'sequenceinput'   Sequence input with 4 dimensions

Create a stochastic actor representation for the network.

actor = rlDiscreteCategoricalActor(net,obsInfo,actInfo);

Use evaluate to return the probability of each of the two possible actions. Note that the type of the returned numbers is single, not double.

[prob,state] = evaluate(actor,{rand(obsInfo.Dimension)});
prob{1}

ans = 2×1 single column vector

    0.5155
    0.4845

Since a recurrent neural network is used for the actor, the second output argument, representing the updated state of the neural network, is not empty. In this case, it contains the updated (cell and hidden) states for the eight units of the lstm layer used in the network.

state{:}

ans = 8×1 single column vector

   -0.0621
   -0.0451
    0.0693
   -0.0025
   -0.0070
    0.0193
   -0.0026
   -0.0489

ans = 8×1 single column vector

   -0.1241
   -0.0922
    0.1251
   -0.0055
   -0.0129
    0.0364
   -0.0051
   -0.0916

You can use dot notation to extract and set the current state of the recurrent neural network in the actor.

actor.State

ans=2×1 cell array
    {8×1 dlarray}
    {8×1 dlarray}

actor.State = {
      dlarray(-0.1*rand(8,1))
      dlarray(0.1*rand(8,1)) 
      };

You can obtain action probabilities and updated states for a batch of observations. For example, use a batch of five independent observations.

obsBatch = reshape(1:20,4,1,5,1);
[prob,state] = evaluate(actor,{obsBatch})

prob = 1×1 cell array
    {2×5 single}

state=2×1 cell array
    {8×5 single}
    {8×5 single}

The output arguments contain action probabilities and updated states for each observation in the batch.

Note that the actor treats observation data along the batch length dimension independently, not sequentially.

prob{1}

ans = 2×5 single matrix

    0.5691    0.5978    0.5986    0.5960    0.5932
    0.4309    0.4022    0.4014    0.4040    0.4068

prob = evaluate(actor,{obsBatch(:,:,[5 4 3 1 2])});
prob{1}

ans = 2×5 single matrix

    0.5932    0.5960    0.5986    0.5691    0.5978
    0.4068    0.4040    0.4014    0.4309    0.4022

To evaluate the actor using sequential observations, use the sequence length (time) dimension. For example, obtain action probabilities for five independent sequences, each one made of nine sequential observations.

[prob,state] = evaluate(actor, ...
    {rand([obsInfo.Dimension 5 9])})

prob = 1×1 cell array
    {2×5×9 single}

state=2×1 cell array
    {8×5 single}
    {8×5 single}

The first output argument contains a vector of two probabilities (first dimension) for each element of the observation batch (second dimension) and for each time element of the sequence length (third dimension).

The second output argument contains two vectors of final states for each observation batch (that is, the network maintains a separate state history for each observation batch).

Display the probability of the second action, after the seventh sequential observation in the fourth independent batch.

prob{1}(2,4,7)

ans = single
    0.4894

For more information on input and output format for recurrent neural networks, see the Algorithms section of lstmLayer.

Input Arguments

collapse all

`fcnAppx` — Function approximator object
function approximator object

Function approximator object, specified as one of the following:

rlValueFunction object — Value function critic
rlQValueFunction object — Q-value function critic
rlVectorQValueFunction object — Multi-output Q-value function critic with a discrete action space
rlContinuousDeterministicActor object — Deterministic policy actor with a continuous action space
rlDiscreteCategoricalActor — Stochastic policy actor with a discrete action space
rlContinuousGaussianActor object — Stochastic policy actor with a continuous action space
rlContinuousDeterministicTransitionFunction object — Continuous deterministic transition function for a model based agent
rlContinuousGaussianTransitionFunction object — Continuous Gaussian transition function for a model based agent
rlContinuousDeterministicRewardFunction object — Continuous deterministic reward function for a model based agent
rlContinuousGaussianRewardFunction object — Continuous Gaussian reward function for a model based agent
rlIsDoneFunction object — Is-done function for a model based agent

`inData` — Input data for function approximator
cell array

Input data for the function approximator, specified as a cell array with as many elements as the number of input channels of fcnAppx. In the following section, the number of observation channels is indicated by N_O.

If fcnAppx is an rlQValueFunction, an rlContinuousDeterministicTransitionFunction or an rlContinuousGaussianTransitionFunction object, then each of the first N_O elements of inData must be a matrix representing the current observation from the corresponding observation channel. They must be followed by a final matrix representing the action.
If fcnAppx is a function approximator object representing an actor or critic (but not an rlQValueFunction object), inData must contain N_O elements, each one a matrix representing the current observation from the corresponding observation channel.
If fcnAppx is an rlContinuousDeterministicRewardFunction, an rlContinuousGaussianRewardFunction, or an rlIsDoneFunction object, then each of the first N_O elements of inData must be a matrix representing the current observation from the corresponding observation channel. They must be followed by a matrix representing the action, and finally by N_O elements, each one being a matrix representing the next observation from the corresponding observation channel.

Each element of inData must be a matrix of dimension M_C-by-L_B-by-L_S, where:

M_C corresponds to the dimensions of the associated input channel.
L_B is the batch size. To specify a single observation, set L_B = 1. To specify a batch of (independent) inputs, specify L_B > 1. If inData has multiple elements, then L_B must be the same for all elements of inData.
L_S specifies the sequence length (length of the sequence of inputs along the time dimension) for recurrent neural network. If fcnAppx does not use a recurrent neural network (which is the case for environment function approximators, as they do not support recurrent neural networks), then L_S = 1. If inData has multiple elements, then L_S must be the same for all elements of inData.

For more information on input and output formats for recurrent neural networks, see the Algorithms section of lstmLayer.

Example: {rand(8,3,64,1),rand(4,1,64,1),rand(2,1,64,1)}

`useForward` — Option to use parallel training
`false` (default) | `true`

Option to use forward pass, specified as a logical value. When you specify UseForward=true the function calculates its outputs using forward instead of predict. This allows layers such as batch normalization and dropout to appropriately change their behavior for training.

Example: true

Output Arguments

collapse all

`outData` — Output data from evaluation of function approximator object
cell array

Output data from the evaluation of the function approximator object, returned as a cell array. The size and contents of outData depend on the type of object you use for fcnAppx, and are shown in the following list. Here, N_O is the number of observation channels.

rlContinuousDeterministicTransitionFunction - N_O matrices, each one representing the predicted observation from the corresponding observation channel.
rlContinuousGaussianTransitionFunction - N_O matrices representing the mean value of the predicted observation for the corresponding observation channel, followed by N_O matrices representing the standard deviation of the predicted observation for the corresponding observation channel.
rlContinuousGaussianActor - Two matrices representing the mean value and standard deviation of the action, respectively.
rlDiscreteCategoricalActor - A matrix with the probabilities of each action.
rlContinuousDeterministicActor A matrix with the action.
rlVectorQValueFunction - A matrix with the values of each possible action.
rlQValueFunction - A matrix with the value of the action.
rlValueFunction - A matrix with the value of the current observation.
rlContinuousDeterministicRewardFunction - A matrix with the predicted reward as a function of current observation, action, and next observation following the action.
rlContinuousGaussianRewardFunction - Two matrices representing the mean value and standard deviation, respectively, of the predicted reward as a function of current observation, action, and next observation following the action.
rlIsDoneFunction - A vector with the probabilities of the predicted termination status. Termination probabilities range from 0 (no termination predicted) or 1 (termination predicted), and depend (in the most general case) on the values of observation, action, and next observation following the action.

Each element of outData is a matrix of dimensions D-by-L_B-by-L_S, where:

D is the vector of dimensions of the corresponding output channel of fcnAppx. Depending on the type of approximator function, this channel can carry a predicted observation (or its mean value or standard deviation), an action (or its mean value or standard deviation), the value (or values) of an observation (or observation-action couple), a predicted reward, or a predicted termination status.
L_B is the batch size (length of a batch of independent inputs).
L_S is the sequence length (length of the sequence of inputs along the time dimension) for a recurrent neural network. If fcnAppx does not use a recurrent neural network (which is the case for environment function approximators, as they do not support recurrent neural networks), then L_S = 1.

Note

If fcnAppx is a critic, then evaluate behaves identically to getValue except that it returns results inside a single-cell array. If fcnAppx is an rlContinuousDeterministicActor actor, then evaluate behaves identically to getAction. If fcnAppx is a stochastic actor such as an rlDiscreteCategoricalActor or rlContinuousGaussianActor, then evaluate returns the action probability distribution, while getAction returns a sample action. Specifically, for an rlDiscreteCategoricalActor actor object, evaluate returns the probability of each possible action. For an rlContinuousGaussianActor actor object, evaluate returns the mean and standard deviation of the Gaussian distribution. For these kinds of actors, see also the note in getAction regarding the enforcement of constraints set by the action specification.

Note

If fcnAppx is an rlContinuousDeterministicRewardFunction object, then evaluate behaves identically to predict except that it returns results inside a single-cell array. If fcnAppx is an rlContinuousDeterministicTransitionFunction object, then evaluate behaves identically to predict. If fcnAppx is an rlContinuousGaussianTransitionFunction object, then evaluate returns the mean value and standard deviation the observation probability distribution, while predict returns an observation sampled from this distribution. Similarly, for an rlContinuousGaussianRewardFunction object, evaluate returns the mean value and standard deviation the reward probability distribution, while predict returns a reward sampled from this distribution. Finally, if fcnAppx is an rlIsDoneFunction object, then evaluate returns the probabilities of the termination status being false or true, respectively, while predict returns a predicted termination status sampled with these probabilities.

`nextState` — Updated state of function approximator object
cell array

Next state of the function approximator object, returned as a cell array. If fcnAppx does not use a recurrent neural network (which is the case for environment function approximators), then nextState is an empty cell array.

You can set the state of the approximator to state using dot notation. For example:

critic.State = state;

Tips

When the elements of the cell array in inData are dlarray objects, the elements of the cell array returned in outData are also dlarray objects. This allows evaluate to be used with automatic differentiation.

Specifically, you can write a custom loss function that directly uses evaluate and dlgradient within it, and then use dlfeval and dlaccelerate with your custom loss function. For an example, see Train Reinforcement Learning Policy Using Custom Training Loop and Custom Training Loop with Simulink Action Noise.

Version History

Introduced in R2022a

evaluate

Syntax

Description

Examples

Evaluate a Function Approximator Object

Input Arguments

`fcnAppx` — Function approximator object
function approximator object

`inData` — Input data for function approximator
cell array

`useForward` — Option to use parallel training
`false` (default) | `true`

Output Arguments

`outData` — Output data from evaluation of function approximator object
cell array

`nextState` — Updated state of function approximator object
cell array

Tips

Version History

See Also

Functions

Objects

Topics

evaluate

Syntax

Description

Examples

Evaluate a Function Approximator Object

Input Arguments

fcnAppx — Function approximator object function approximator object

inData — Input data for function approximator cell array

useForward — Option to use parallel training false (default) | true

Output Arguments

outData — Output data from evaluation of function approximator object cell array

nextState — Updated state of function approximator object cell array

Tips

Version History

See Also

Functions

Objects

Topics

`fcnAppx` — Function approximator object
function approximator object

`inData` — Input data for function approximator
cell array

`useForward` — Option to use parallel training
`false` (default) | `true`

`outData` — Output data from evaluation of function approximator object
cell array

`nextState` — Updated state of function approximator object
cell array