Mix of static and dynamic actions for a Reinforcement Learning episode

11 views (last 30 days)
Hi,
I need to optimize two different types of actions for RL:
  1. Static: it's the same for each step. It doesn't change. But it still needs to optimized to determine the best distance combined with other action. This action is decided in the beginning of the episode and stays the same through all the steps of the episode until the end reset. For example: distance - numeric float [ 43.3 ]
  2. Changing: this is the normal type of action. It changes on each time step. For example: speed - numeric float [ 1.2 3.3 ....3 ]
I was thinking of using a time counter inside teh step function to determine the first distance action and then only use that for the rest of the step.
if time == 0
this.firstDistance = action(1)
end
time_traveled = this.firstDistance / action(2) %action(2) being speed
This is a simple example. I may have hundreds of static and changing actions to optimize in the future.
Do you agree with this approach? Is there a better way of accomplishing this?

Answers (1)

Emmanouil Tzorakoleftherakis
Hello,
I am not sure the approach you mention would work, since even if you constrain the constant action, the agent will still be generating some action value that you just won't be using. So it will eventually associate actions with wrong observations.
It seems to me that you may want to use two different agents that operate at different sample times, but training multiple agents simultaneously is only supported in Simulink right now.
In a MATLAB environment you could train these two agents sequentially:
1) First train the normal agent and use some other mechanism to generate groundtruth "fixed" actions.
2) Then train another agent for the fixed actions part using the trained "normal" agent
  3 Comments
Emmanouil Tzorakoleftherakis
Edited: Emmanouil Tzorakoleftherakis on 23 Feb 2021
You can create this correlation/dependency by simply providing the fixed action as an input/observation to the normal agent. Not sure about the specifics, so this is up to you.
We don't have an example that shows how to convert a MATLAB environment into Simulink, but the idea is that you put the 'step' function in a MATLAB Fcn block, the reward function in another MATLAB Fcn block and you use the RL Agent block to connect these together. Maybe following a simple example in Simulink would help.
In Simulink the two agents are trained simultaneoussly at the time step level (so you are not training them sequentially)
John Doe
John Doe on 25 Mar 2021
I'm attempting the simulink method now that I've gotten practice with this.
" two different agents that operate at different sample times" - How would I do this in Simulink to keep actions fixed for one agent? I ran the watertank example successfully now. I need to set up my own now.

Sign in to comment.

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!