Deep Q Network implementation

5 views (last 30 days)
Zonghao zou
Zonghao zou on 15 Oct 2020
Edited: Zonghao zou on 16 Oct 2020
Hello all,
I have an implementation question. I am new to reinforcement learning. Previously, I have played around with Q-learning. I wrote the code myself and it worked fine. Now as I am increasing the state space and action space, I want to start to explore Deep Q Network.
The problem of concern is rather simple. My state space is represented by a list composed of 1s and 0s. The number of action is the same as the length of the state space. For example, for the case of state space = 5, my initial state space = [1;1;1;1;1], action choices are [1 2 3 4 5]. Each action corresponds to an index in the state space. By choosing an action, I flipped the corresponding state space bit. Lets say I take action 3 , the state space becomes [1;1;0;1;1]. If I take action 3 again, I go back to [1;1;1;1;1].
Each action will give me a reward. The reward value for each action is set and deteremined by a Physical model. I built the DQN environment via this link. Then I created the rlDQNAgent via the first method from this page :agent = rlDQNAgent(observationInfo,actionInfo). Then, I trained the agent using information from this page. I set max epsidoe to 40, and step per epsidoe to 50 with some other parameters. I wait util the Epsidoe Q0 converges and stabilized then I stop the trainning process. . A picture is attached for reference.
This process works fairly well for size of state space =2, but when I increase the sate space number. I don't get any results that matches my expectation, as I wrote the physical model. It should provide another optimal policy.
I am wondering what might be wrong. Do I need to add more to the model? Is there anything I could do to first speed up the convergence, second to make sure it converges to the correct policy?
Any help or suggestions will be greatly appreciated!!
Sorry for the lengthy question :)
  2 Comments
Md Muzakkir Quamar
Md Muzakkir Quamar on 15 Oct 2020
can you please help me with Qlearning.
any example code or any easy example That can help me understand. i have to use in my course project but i have no prior knowledge of Q learning.
Zonghao zou
Zonghao zou on 16 Oct 2020
Edited: Zonghao zou on 16 Oct 2020
I attached my sample code below, you have to change your step function that provides the next_state information and reward from performing that action.
for i = 1:Nrounds
index = (i-1)*Nsteps+1:i*Nsteps;
if i <Nrounds*GE
% determine the amount of time step to perform epsilon-greedy
if rand < epsilongreedy
%here takes a random action
else
[value,action] = max(q_table(state,:));
% otherwise take the action gives you the maximum reward from you q_table
end
else
[value,action] = max(q_table(state,:));
% no longer use epsilon greedy
end
[next_state, reward ] = step(state, action);
% change it for your purpose, get the next_state and reward
if i<Nrounds*GE
old_value = q_table(state, action);
next_max = max(q_table(next_state,:));
new_value = (1-alpha) * old_value +alpha *(reward + gamma *next_max);
q_table(state,action) = new_value;
end % stop updating after epsilon greedy
state= next_state;

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!