Error using rl.env.Sim​ulinkEnvWi​thAgent>lo​calHandleS​imoutError​s (line 689) (By RL toolbox)

9 views (last 30 days)
I want to creat the multi-discrete actor outputs.
It will be like delta1 output 1 or 0, and delta2 is the same.
but there comes the error
Error using rl.env.AbstractEnv/simWithPolicy (line 70)
An error occurred while simulating "quarter_car" with the agent "agent".
Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 689)
Invalid input argument type or size such as observation, reward, isdone or loggedSignals.
Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 689)
Unable to evaluate representation.
Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 689)
The logical indices contain a true value outside of the array bounds.
I don't understand the error is cause by the code or the simulink, and how to fix it.
% create observation info
observationInfo = rlNumericSpec([numObs 1],'LowerLimit',-inf*ones(numObs,1),'UpperLimit',inf*ones(numObs,1));
observationInfo.Name = 'observation';
% create action Info
actionInfo = rlFiniteSetSpec({[0;0],[1;1]});
actionInfo.Name = 'actor';
% define environment
env = rlSimulinkEnv(mdl,agentblk,observationInfo,actionInfo);
rng(0)
actorNetwork = [
imageInputLayer([numObs 1 1],'Normalization','none','Name','observation')
fullyConnectedLayer(200,'Name','ActorFC1')
reluLayer('Name','ActorRelu1')
fullyConnectedLayer(150,'Name','ActorFC2')
reluLayer('Name','ActorRelu2')
fullyConnectedLayer(numAct,'Name','ActorFC3')
tanhLayer('Name','ActorTanh')];
actorOpts = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor= rlStochasticActorRepresentation(actorNetwork, obsInfo, actInfo, 'Observation', {'observation'}, actorOpts);
agentOpts = rlPPOAgentOptions(...
'ExperienceHorizon',600,...
'ClipFactor',0.02,...
'EntropyLossWeight',0.01,...
'MiniBatchSize',128,...
'NumEpoch',3,...
'AdvantageEstimateMethod','gae',...
'GAEFactor',0.95,...
'SampleTime',h,...
'DiscountFactor',0.997);
agent = rlPPOAgent(actor,critic,agentOpts);

Answers (1)

Emmanouil Tzorakoleftherakis
Edited: Emmanouil Tzorakoleftherakis on 1 Dec 2020
Hello,
Based on the attached files, it seems like you are creating a PPO agent but you are creating a Q network for a critic. If you look at this page, PPO implementation in Reinforcement Learning Toolbox requires a V critic. If you change your critic network to be equivalent to, e.g., this example, the errors go away.
Hope that helps
  4 Comments

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!