Q-table issues in the example "Q-learning in the basic grid world"

1 view (last 30 days)

Fangyuan Chang on 9 Nov 2020

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/641220-q-table-issues-in-the-example-q-learning-in-the-basic-grid-world

Commented: Adi Firdaus on 11 Dec 2021

I trained a Q-learning agent in the matlab predefined environment "BasicGridWorld". I have an issue about the updates of the Q-table. When I set the number of episode to be 1, and set the episode step to be 1, I expect that the new updated Q-value equals to (alpha * R) according to the Bellman equation, where alpha is the learning rate and R is the instant reward. However, the code generates a Q-value different from my expectation. Can anyone help? The code is attached as follows:

rng(0)
env = rlPredefinedEnv("BasicGridWorld");
qTable = rlTable(getObservationInfo(env),getActionInfo(env));
critic = rlQValueRepresentation(qTable,getObservationInfo(env),getActionInfo(env));
critic.Options.LearnRate = 0.1;
critic.Options.L2RegularizationFactor = 0;
critic.Options.Optimizer = "sgdm";
critic.Options.OptimizerParameters.Momentum = 0;
opts = rlQAgentOptions;
opts.EpsilonGreedyExploration.Epsilon = 0.8;
opts.EpsilonGreedyExploration.EpsilonMin = 0.01;
opts.EpsilonGreedyExploration.EpsilonDecay = 0.01;
opts.DiscountFactor = 0.5; 
agent = rlQAgent(critic,opts);
trainOpts = rlTrainingOptions(...
    'MaxEpisodes',1,...
    'MaxStepsPerEpisode',1,...
    'StopTrainingCriteria',"AverageReward",...
    'StopTrainingValue',30,...
    'Verbose',true,...
    'Plots','none');
trainOpts.ScoreAveragingWindowLength = 50;
trainingStats = train(agent,env,trainOpts);
trained_critic=getCritic(agent);
trained_table = getLearnableParameters(trained_critic);
trained_qtable=trained_table{1};
% check the updated Q-value
[r,c]=find(trained_table{1,1}~=0);
Q_value = trained_table{1,1}(r,c)

Can anyone help point out my error?

Thank you very much.

1 Comment
Show -1 older commentsHide -1 older comments

Adi Firdaus on 11 Dec 2021

need answer too

Answers (0)

Products

Reinforcement Learning Toolbox

Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Q-table issues in the example "Q-learning in the basic grid world"

1 Comment
Show -1 older commentsHide -1 older comments

Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Q-table issues in the example "Q-learning in the basic grid world"

1 Comment Show -1 older commentsHide -1 older comments

Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments