rlCustomEvaluator
Custom object for evaluating reinforcement learning agents during training
Since R2023b
Description
Create an rlCustomEvaulator
object to specify a custom function
and evaluation frequency that you want to use to evaluate agents during training. To train the
agents, pass this object to train
.
For more information on training agents, see Train Reinforcement Learning Agents.
Creation
Syntax
Description
returns the custom evaluator object evaluator
= rlCustomEvaluator(evalFcn
)evaluator
. The
evalFcn
argument is a handle to your custom MATLAB® evaluation function.
also specifies the number of training episodes after which evaluator
= rlCustomEvaluator(evalFcn
,EvaluationFrequency=evalPeriod
)train
calls the evaluation function.
Properties
evalFcn
— Custom evaluation function
function handle
Custom evaluation function, specified as a function handle. The train
function
calls evalFcn
after evalPeriod
episodes.
Your evaluation function must have three inputs and three outputs, as illustrated by the following signature.
[statistic, scores, data] = myEvalFcn(agent, environment, trainingInfo)
Given an agent, its environment, and training episode information, the custom evaluation function runs a number of evaluation episodes and returns a corresponding summarizing statistic, a vector of episode scores, and any additional data that might be needed for logging.
The required input arguments (passed to evalFcn
from
train
) are:
agent
— Agent to evaluate, specified as a reinforcement learning agent object. For multiagent environments, this is a cell array of agent objects.environment
— Environments within which the agents are evaluated, specified as a reinforcement environment object.trainingInfo
— A structure containing the following fields.episodeIndex
— Current episode index, specified as a positive integerepisodeInfo
— A structure containing the fieldsCumulativeReward
,StepsTaken
, andInitialObservation
, which contain, respectively, the cumulative reward, the number of steps taken, and the initial observations of the current training episode
The output arguments (passed from evalFcn
to
train
) are:
statistic
— A statistic computed from a group of consecutive evaluation episodes. Common statistics are the mean, medium, maximum, and minimum. At the end of the training, this value is returned bytrain
as the element of theEvaluationStatistics
vector corresponding to the last training episode.scores
— A vector of episode scores from each evaluation episode. You can use a logger object to store this argument during training.data
— Any additional data from evaluation that you might find useful, for example for logging purposes. You can use a logger object to store this argument during training.
To use additional input arguments beyond the allowed two, define your additional
arguments in the MATLAB workspace, then specify stepFcn
as an anonymous
function that in turn calls your custom function with the additional arguments defined
in the workspace, as shown in the example Create Custom Environment Using Step and Reset Functions.
Example: evalFcn=@myEvalFcn
EvaluationFrequency
— Evaluation period
100
(default) | positive integer
Evaluation period, specified as a positive integer. It is the number of episodes
after which NumEpisodes
evaluation episodes are run. For example,
if EvaluationFrequency
is 100
and
NumEpisodes
is 3
then three evaluation
episodes are run, consecutively, after 100 training episodes. The default is
100
.
Example: EvaluationFrequency=200
Object Functions
Examples
Create Custom Evaluator Object
Create an rlcustomEvaluator
object to evaluate an agent during training using a custom evaluation function. Use the function myEvaluationFcn
, defined at the end of this example.
myEvaluator = rlCustomEvaluator(@myEvaluationFcn)
myEvaluator = rlCustomEvaluator with properties: EvaluationFcn: @myEvaluationFcn EvaluationFrequency: 100
Configure the evaluator to run the evaluation function every 200 training episodes.
myEvaluator.EvaluationFrequency = 200;
To evaluate an agent during training using these evaluation options, pass myEvaluator
to train
, as in the following code example.
results = train(agent, env, rlTrainingOptions(), Evaluator=myEvaluator);
For more information see train
.
Custom Evaluation Function
The evaluation function is called by train
every evaluator.EvaluationFrequency
training episodes. Within the evaluation function, if the number of training episodes is up to 1000, run just one evaluation episode; otherwise, run 10 consecutive evaluation episodes. Configure the agent to use a greedy policy (no exploration) during evaluation, and return the eight largest episode reward as final statistic (this is consistent with achieving a desired reward 80% of the time).
function [statistic, scores, data] = ... myEvaluationFcn(agent, env, trainingEpisodeInfo) % Do not use an exploration policy for evaluation. agent.UseExplorationPolicy = false; % Set the number of consecutive evaluation episodes to run. if trainingEpisodeInfo.EpisodeIndex <= 1000 numEpisodes = 1; else numEpisodes = 10; end % Initialize the rewards and data arrays. episodeRewards = zeros(numEpisodes, 1); data = cell(numEpisodes, 1); % Run numEpisodes consecutive evaluation episodes. for evaluationEpisode = 1:numEpisodes % Use a fixed random seed for reproducibility. rng(evaluationEpisode*10) % Run one evaluation episode. The output is a structure % containing various agent simulation information, % as described in runEpisode. episodeResults = runEpisode(env, agent, ... MaxSteps=500, ... CleanupPostSim=false); if isa(episodeResults,"rl.env.Future") % For parallel simulation, fetch data from workers. [~,out] = fetchNext(episodeResults); % Collect the episode cumulative reward. episodeRewards(evaluationEpisode) = ... out.AgentData.EpisodeInfo.CumulativeReward; % Collect the whole data structure. data{evaluationEpisode} = out; else % Collect the episode cumulative reward. episodeRewards(evaluationEpisode) = ... episodeResults.AgentData.EpisodeInfo.CumulativeReward; data{evaluationEpisode} = episodeResults; end end % Return the eight largest episode reward if 10 episodes % are run, otherwise return just the greatest (and only) reward. statistic = sort(episodeRewards); if length(statistic) == 10 statistic = statistic(8); else % Make sure to always return a scalar in any case. statistic = statistic(end); end % Return the rewards vector. scores = episodeRewards; end
Version History
Introduced in R2023b
See Also
Functions
Objects
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)