rlQValueRepresentation
(Not recommended) Q-Value function critic representation for reinforcement learning agents
rlQValueRepresentation is not recommended. Use rlQValueFunction or
rlVectorQValueFunction
instead. For more information, see rlQValueRepresentation is not recommended.
Description
This object implements a Q-value function approximator to be used as a critic
within a reinforcement learning agent. A Q-value function is a function that maps an
observation-action pair to a scalar value representing the expected total long-term rewards
that the agent is expected to accumulate when it starts from the given observation and
executes the given action. Q-value function critics therefore need both observations and
actions as inputs. After you create an rlQValueRepresentation critic, use it
to create an agent relying on a Q-value function critic, such as an rlQAgent, rlDQNAgent, rlSARSAAgent, rlDDPGAgent, or rlTD3Agent. For more
information on creating representations, see Create Policies and Value Functions.
Creation
Syntax
Description
Scalar Output Q-Value Critic
creates the Q-value function critic = rlQValueRepresentation(net,observationInfo,actionInfo,'Observation',obsName,'Action',actName)critic. net is
the deep neural network used as an approximator, and must have both observations and
action as inputs, and a single scalar output. This syntax sets the ObservationInfo
and ActionInfo
properties of critic respectively to the inputs
observationInfo and actionInfo, containing
the observations and action specifications. obsName must contain
the names of the input layers of net that are associated with the
observation specifications. The action name actName must be the
name of the input layer of net that is associated with the action
specifications.
creates the Q-value function based critic = rlQValueRepresentation(tab,observationInfo,actionInfo)critic with discrete
action and observation spaces from the Q-value table
tab. tab is a rlTable object
containing a table with as many rows as the possible observations and as many columns as
the possible actions. This syntax sets the ObservationInfo
and ActionInfo
properties of critic respectively to the inputs
observationInfo and actionInfo, which must
be rlFiniteSetSpec
objects containing the specifications for the discrete observations and action spaces,
respectively.
creates a Q-value function based critic = rlQValueRepresentation({basisFcn,W0},observationInfo,actionInfo)critic using a custom basis
function as underlying approximator. The first input argument is a two-elements cell in
which the first element contains the handle basisFcn to a custom
basis function, and the second element contains the initial weight vector
W0. Here the basis function must have both observations and
action as inputs and W0 must be a column vector. This syntax sets
the ObservationInfo
and ActionInfo
properties of critic respectively to the inputs
observationInfo and actionInfo.
Multi-Output Discrete Action Space Q-Value Critic
creates the multi-output Q-value function
critic = rlQValueRepresentation(net,observationInfo,actionInfo,'Observation',obsName)critic
for a discrete action space. net is the deep
neural network used as an approximator, and must have only the observations as input and
a single output layer having as many elements as the number of possible discrete
actions. This syntax sets the ObservationInfo
and ActionInfo
properties of critic respectively to the inputs
observationInfo and actionInfo, containing
the observations and action specifications. Here, actionInfo must
be an rlFiniteSetSpec
object containing the specifications for the discrete action space. The observation
names obsName must be the names of the input layers of
net.
creates the multi-output Q-value function
critic = rlQValueRepresentation({basisFcn,W0},observationInfo,actionInfo)critic
for a discrete action space using a custom basis function as
underlying approximator. The first input argument is a two-elements cell in which the
first element contains the handle basisFcn to a custom basis
function, and the second element contains the initial weight matrix
W0. Here the basis function must have only the observations as
inputs, and W0 must have as many columns as the number of possible
actions. This syntax sets the ObservationInfo
and ActionInfo
properties of critic respectively to the inputs
observationInfo and actionInfo.
Options
creates the value function based critic = rlQValueRepresentation(___,options)critic using the additional option
set options, which is an rlRepresentationOptions object. This syntax sets the Options property of critic to the
options input argument. You can use this syntax with any of the
previous input-argument combinations.
Input Arguments
Properties
Object Functions
rlDDPGAgent | Deep deterministic policy gradient (DDPG) reinforcement learning agent |
rlTD3Agent | Twin-delayed deep deterministic (TD3) policy gradient reinforcement learning agent |
rlDQNAgent | Deep Q-network (DQN) reinforcement learning agent |
rlQAgent | Q-learning reinforcement learning agent |
rlSARSAAgent | SARSA reinforcement learning agent |
rlSACAgent | Soft actor-critic (SAC) reinforcement learning agent |
getValue | Obtain estimated value from a critic given environment observations and actions |
getMaxQValue | Obtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations |