rlEpsilonGreedyPolicy
Policy object to generate discrete epsilon-greedy actions for custom training loops
Since R2022a
Description
This object implements an epsilon-greedy policy, which returns either the action
that maximizes a discrete action-space Q-value function, with probability
1-Epsilon, or a random action otherwise, given an input observation. You
can create an rlEpsilonGreedyPolicy object from an rlQValueFunction or
rlVectorQValueFunction
object, or extract it from an rlQAgent, rlDQNAgent or rlSARSAAgent. You can
then train the policy object using a custom training loop or deploy it for your application.
If UseEpsilonGreedyAction is set to 0 the policy is
deterministic, therefore in this case it does not explore. This object is not compatible with
generatePolicyBlock
and generatePolicyFunction. For more information on policies and value functions,
see Create Policies and Value Functions.
Creation
Description
creates the epsilon-greedy policy object policy = rlEpsilonGreedyPolicy(qValueFunction)policy from the discrete
action-space Q-value function qValueFunction. It also sets the
QValueFunction property of policy to the
input argument qValueFunction.
Properties
Object Functions
getAction | Obtain action from agent, actor, or policy object given environment observations |
getLearnableParameters | Obtain learnable parameter values from agent, function approximator, or policy object |
reset | Reset environment, agent, experience buffer, or policy object |
setLearnableParameters | Set learnable parameter values of agent, function approximator, or policy object |
Examples
Version History
Introduced in R2022a
See Also
Functions
getGreedyPolicy|getExplorationPolicy|generatePolicyBlock|generatePolicyFunction|getAction|getLearnableParameters|setLearnableParameters
Objects
rlMaxQPolicy|rlDeterministicActorPolicy|rlAdditiveNoisePolicy|rlStochasticActorPolicy|rlHybridStochasticActorPolicy|rlQValueFunction|rlVectorQValueFunction|rlSARSAAgent|rlQAgent|rlDQNAgent