Error using rl.env.Sim​ulinkEnvWi​thAgent>lo​calHandleS​imoutError​s (line 689) (By RL toolbox)

I want to creat the multi-discrete actor outputs.
It will be like delta1 output 1 or 0, and delta2 is the same.
but there comes the error
Error using rl.env.AbstractEnv/simWithPolicy (line 70)
An error occurred while simulating "quarter_car" with the agent "agent".
Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 689)
Invalid input argument type or size such as observation, reward, isdone or loggedSignals.
Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 689)
Unable to evaluate representation.
Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 689)
The logical indices contain a true value outside of the array bounds.
I don't understand the error is cause by the code or the simulink, and how to fix it.
% create observation info
observationInfo = rlNumericSpec([numObs 1],'LowerLimit',-inf*ones(numObs,1),'UpperLimit',inf*ones(numObs,1));
observationInfo.Name = 'observation';
% create action Info
actionInfo = rlFiniteSetSpec({[0;0],[1;1]});
actionInfo.Name = 'actor';
% define environment
env = rlSimulinkEnv(mdl,agentblk,observationInfo,actionInfo);
rng(0)
actorNetwork = [
imageInputLayer([numObs 1 1],'Normalization','none','Name','observation')
fullyConnectedLayer(200,'Name','ActorFC1')
reluLayer('Name','ActorRelu1')
fullyConnectedLayer(150,'Name','ActorFC2')
reluLayer('Name','ActorRelu2')
fullyConnectedLayer(numAct,'Name','ActorFC3')
tanhLayer('Name','ActorTanh')];
actorOpts = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor= rlStochasticActorRepresentation(actorNetwork, obsInfo, actInfo, 'Observation', {'observation'}, actorOpts);
agentOpts = rlPPOAgentOptions(...
'ExperienceHorizon',600,...
'ClipFactor',0.02,...
'EntropyLossWeight',0.01,...
'MiniBatchSize',128,...
'NumEpoch',3,...
'AdvantageEstimateMethod','gae',...
'GAEFactor',0.95,...
'SampleTime',h,...
'DiscountFactor',0.997);
agent = rlPPOAgent(actor,critic,agentOpts);

Answers (1)

Hello,
Based on the attached files, it seems like you are creating a PPO agent but you are creating a Q network for a critic. If you look at this page, PPO implementation in Reinforcement Learning Toolbox requires a V critic. If you change your critic network to be equivalent to, e.g., this example, the errors go away.
Hope that helps

4 Comments

My numObsx1 = 5x1
reward = 1x1
isdone = 1x1
I'm not sure how to check the loggedSignals size because I don't have this input in my Simulink model.
Is there any wrong dimension for my input ?
You don't need to worry about loggedSignals here. I cannot see anything obvious, if you share a reproduction model I can take a look.

Sign in to comment.

Categories

Find more on Reinforcement Learning Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!