Randomized position of obstacles in Grid World

11 views (last 30 days)
GCats
GCats on 9 Feb 2022
Edited: StevenKlein on 8 Aug 2022 at 4:15
Hello everyone!
I'm working on training a Q-learning agent using a standard 5x5 gridworld environment. I would like to implement in my environment obstacles such that they change at every episode in the training without ever coinciding with the target state of course. Anyone got any intel?
Here is my code:
GW = createGridWorld(5,5);
GW.CurrentState = '[1,1]';
GW.TerminalStates = '[3,3]';
GW.ObstacleStates = ["[3,2]";"[2,2]";"[2,3]";"[2,4]"; "[3,4]"];
updateStateTranstionForObstacles(GW);
nS = numel(GW.States);
nA = numel(GW.Actions);
GW.R = -1*ones(nS,nS,nA);
% GW.R(state2idx(GW,"[2,4]"),state2idx(GW,"[4,4]"),:) = 5;
GW.R(:,state2idx(GW,GW.TerminalStates),:) = 10;
env = rlMDPEnv(GW)
env.ResetFcn = @() 1;
rng(0)
qTable = rlTable(getObservationInfo(env),getActionInfo(env));
qRepresentation = rlQValueRepresentation(qTable,getObservationInfo(env),getActionInfo(env));
qRepresentation.Options.LearnRate = 1;
agentOpts = rlQAgentOptions;
agentOpts.EpsilonGreedyExploration.Epsilon = .04;
qAgent = rlQAgent(qRepresentation,agentOpts);
%training
trainOpts = rlTrainingOptions;
trainOpts.MaxStepsPerEpisode = 50;
trainOpts.MaxEpisodes= 200;
trainOpts.StopTrainingCriteria = "AverageReward";
trainOpts.StopTrainingValue = 11;
trainOpts.ScoreAveragingWindowLength = 30;
doTraining = true;
if doTraining
% Train the agent.
trainingStats = train(qAgent,env,trainOpts);
else
% Load the pretrained agent for the example.
load('basicGWQAgent.mat','qAgent')
end
plot(env)
env.Model.Viewer.ShowTrace = true;
env.Model.Viewer.clearTrace;
sim(qAgent,env)

Answers (1)

StevenKlein
StevenKlein on 8 Aug 2022 at 4:14
Edited: StevenKlein on 8 Aug 2022 at 4:15
Same question here!
_____

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!