How to modify actions in experiences during a reinforcement learning training

26 views (last 30 days)
Hi experts
I am doing a reinforcement learning project using reinforcement learning. The formulated problem has a huge discrete action set. So instead of using a Deep Q learning with discrete actions, I turned to DDPG with continuous action space. What I want to do is that after each time I got an action from the actor network, I discretize it to the closest VALID discrete action. Then what I want to store in the experience is not the original continuous action, but the closest discrete action. The DDPG training in Matlab seems to store the original action generated by the actor network plus noise by default. Is there any way to MODIFY the stored action in the experience before it is pushed in the memory buffer? Thanks!

Answers (1)

Emmanouil Tzorakoleftherakis
If you are working in Simulink, you can use the "Last Action" port in the RL Agent block to indicate what was the action that was actually applied to the environment.
If your environment is in MATLAB, you can either move it to Simulink with a MATLAB Fcn block and follow the above, or you can write your own custom training loop.
  7 Comments
Ran
Ran on 9 Aug 2022
@Emmanouil Tzorakoleftherakis That makes a lot of sense. One more question that confuses me is that when calculating the observations (which I assume is the next states), reward and isdone, we need to have the current states information. But from the examples provided in Matlab, I don't see any modules that store the current states of the system. Can I use the observation input in the RL agent block or I should create some variables in Environment module to store the current states? Thanks!
Ran
Ran on 11 Aug 2022
I have created a simulink draft as shown below.
I create a function block to discretize my action actually applied to the environment. The environment is another block on the right with output ports including NextObs, reward, and isdone. The "delay" block on the top right corner is to let the environment derive the next observations based on the previous observation. Could you please help check whether the draft makes sense or not?
Particurly, two questions confuse me:
1) As RL needs to derive next states based on the current states, how do the current states are stored in the environment block?
2) I tried to reset the initial state by doing this
function in = localResetFcn(in,N_UAV)
% Initial state: all fully charged with E_Cap, all start from ground, hr is
%
state = [2*ones(1,N_UAV),zeros(1,N_UAV),4]'; %/E_Cap*2 because of input normalization
blk = sprintf('Env_UAVChg/Environment/NextObs');
in = setBlockParameter(in,blk,'InitialCondition',num2str(state));
end
but I got an error: Outport block does not have a parameter named 'InitialCondition'. Could you please advise how to reset the states for each episode? Thanks

Sign in to comment.

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!