Reinforcement learning agent
Reinforcement Learning Toolbox
Use the RL Agent block to simulate and train a reinforcement learning agent
in Simulink®. You associate the block with an agent stored in the MATLAB® workspace or a data dictionary as an agent object such as an
You connect the block so that it receives an observation and a computed reward. For instance,
consider the following block diagram of the
observation input port of the RL Agent block
receives a signal that is derived from the instantaneous angle and angular velocity of the
reward port receives a reward calculated from the same two
values and the applied action. You configure the observations and reward computations that are
appropriate to your system.
The block uses the agent to generate an action based on the observation and reward you
provide. Connect the
action output port to the appropriate input for your
system. For instance, in the
action port is a torque applied to the pendulum system. For more
information about this model, see Train DQN Agent to Swing Up and Balance Pendulum.
To train a reinforcement learning agent in Simulink, you generate an environment from the Simulink model. You then create and configure the agent for training against that
environment. For more information, see Create Simulink Environments for Reinforcement Learning. When you call
train using the
train simulates the model and updates the agent associated
with the block.
observation— Environment observations
This port receives observation signals from the environment. Observation signals
represent measurements or other instantaneous system data. If you have multiple
observations, you can use a Mux block to combine them into a vector
signal. To use a nonvirtual bus signal, use
reward— Reward from environment
This port receives the reward signal, which you compute based on the observation data. The reward signal is used during agent training to maximize the expectation of the long-term reward.
isdone— Flag to terminate episode simulation
Use this signal to specify conditions under which to terminate a training episode. You must configure logic appropriate to your system to determine the conditions for episode termination. One application is to terminate an episode that is clearly going well or going poorly. For instance, you can terminate an episode if the agent reaches its goal or goes irrecoverably far from its goal.
action— Agent action
Action computed by the agent based on the observation and reward inputs. Connect
this port to the inputs of your system. To use a nonvirtual bus signal, use
cumulative_reward— Total reward
Cumulative sum of the reward signal during simulation. Observe or log this signal to track how the cumulative reward evolves over time.
To enable this port, select the Provide cumulative reward signal parameter.
Provide cumulative reward signal— Add cumulative reward output port
cumulative_reward block output by selecting this