Deep Q Network implementation

Question

Zonghao zou on 15 Oct 2020

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/615338-deep-q-network-implementation

Edited: Zonghao zou on 16 Oct 2020

Hello all,

I have an implementation question. I am new to reinforcement learning. Previously, I have played around with Q-learning. I wrote the code myself and it worked fine. Now as I am increasing the state space and action space, I want to start to explore Deep Q Network.

The problem of concern is rather simple. My state space is represented by a list composed of 1s and 0s. The number of action is the same as the length of the state space. For example, for the case of state space = 5, my initial state space = [1;1;1;1;1], action choices are [1 2 3 4 5]. Each action corresponds to an index in the state space. By choosing an action, I flipped the corresponding state space bit. Lets say I take action 3 , the state space becomes [1;1;0;1;1]. If I take action 3 again, I go back to [1;1;1;1;1].

Each action will give me a reward. The reward value for each action is set and deteremined by a Physical model. I built the DQN environment via this link. Then I created the rlDQNAgent via the first method from this page :agent = rlDQNAgent(observationInfo,actionInfo). Then, I trained the agent using information from this page. I set max epsidoe to 40, and step per epsidoe to 50 with some other parameters. I wait util the Epsidoe Q0 converges and stabilized then I stop the trainning process.

. A picture is attached for reference.

This process works fairly well for size of state space =2, but when I increase the sate space number. I don't get any results that matches my expectation, as I wrote the physical model. It should provide another optimal policy.

I am wondering what might be wrong. Do I need to add more to the model? Is there anything I could do to first speed up the convergence, second to make sure it converges to the correct policy?

Any help or suggestions will be greatly appreciated!!

Sorry for the lengthy question :)

2 Comments
Show NoneHide None

Md Muzakkir Quamar on 15 Oct 2020

can you please help me with Qlearning.

any example code or any easy example That can help me understand. i have to use in my course project but i have no prior knowledge of Q learning.

Zonghao zou on 16 Oct 2020

Edited: Zonghao zou on 16 Oct 2020

Open in MATLAB Online

I attached my sample code below, you have to change your step function that provides the next_state information and reward from performing that action.

for i = 1:Nrounds
        index = (i-1)*Nsteps+1:i*Nsteps;
        if i <Nrounds*GE
            % determine the amount of time step to perform epsilon-greedy
            if rand < epsilongreedy
                %here takes a random action
            else
                [value,action] = max(q_table(state,:));
                % otherwise take the action gives you the maximum reward from you q_table
            end
        else
            [value,action] = max(q_table(state,:));
            % no longer use epsilon greedy
        end
        
        [next_state, reward ] = step(state, action);
%         change it for your purpose, get the next_state and reward
        
        if i<Nrounds*GE
            old_value = q_table(state, action);
            next_max = max(q_table(next_state,:));
            new_value = (1-alpha) * old_value +alpha *(reward + gamma *next_max);
            q_table(state,action) = new_value;
        end % stop updating after epsilon greedy
        state= next_state;
        
        

Sign in to comment.

Sign in to answer this question.

Deep Q Network implementation

2 Comments
Show NoneHide None

Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

Deep Q Network implementation

2 Comments Show NoneHide None

Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

2 Comments
Show NoneHide None