Reinforcement learning DDPG Agent semi active control issue

8 views (last 30 days)
Dear Matlab community,
i have implemented a reinforcement learning agent (DDPG) for controlling a semi-active suspension system in Simulink for my master thesis. The Simulink model is a half car model with two tires connected to a chassis body and the agent should control the variable dampers of the front and back axis. But every learning session with a huge number of episodes the DDPG Agent only learns a suboptimal control strategy. Mostly the results are the lowest possible damping ratio for the back axis and the maximum for the front axis with just tiny control adjustments (example in the picture).
Description of the Model:
  • 13 continuous Observation
  • 2 continuous Actions
  • Reward function with negative quadratic chassis and pitch acceleration
  • Resetfunction loads a pseudorandom road profile each episode
  • Damping ratio from 900 to 4300 Ns/m
  • Each episode last 10 seconds
I have tried with all these changes and the results are mostly the same:
  • ‘NumHiddenUnit’ 25 and 256
  • Learn rate Actor = 1e-3 and 1e-4
  • With and without parallel computing
  • 300, 1500 and 2000 episodes
My questions:
  • What is wrong with my agent that he only makes small control steps?
  • Is it possible, that my DDPG Agent doenst explore enough?
Sorry for my bad english and i thank you all for the help.
%% Agent creation
% Actionspace
actInfo = rlNumericSpec([2 1], ...
'LowerLimit', hfmParam.dA.value(1), ...
'UpperLimit', hfmParam.dA.value(2));
% Observationspace
obsInfo = rlNumericSpec([13 1], ...
'LowerLimit', [-inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf 0]', ...
'UpperLimit', [inf inf inf inf inf inf inf inf inf inf inf inf 40]');
%% Enviroment
env = rlSimulinkEnv(mdl, agentBlock, obsInfo, actInfo);
env.ResetFcn = @(in)localResetFcn(in);
% Agent options
agentOpts = rlDDPGAgentOptions('SampleTime', tS);
knnOpts = rlAgentInitializationOptions('NumHiddenUnit', obsInfo.Dimension(1)*2-1);
% Agent
agent = rlDDPGAgent(obsInfo, actInfo, knnOpts, agentOpts);
critic = getCritic(agent);
critic.Options.LearnRate = 1e-3;
agent = setCritic(agent, critic);
actor = getActor(agent);
actor.Options.LearnRate = 1e-4;
agent = setActor(agent, actor);

Accepted Answer

Emmanouil Tzorakoleftherakis
Hello,
This is very open-ended so there could be a lot of ways to improve your setup. My guess is that the issue is very relevant to the question you raise above. If the agent does not explore enough, all the other parameters you played with won't make much difference.
First, it is important to understand how exploration works for DDPG. Literally, whate happens is that we are adding noise sampled from a noise model to the deterministic policy output (step 1 here). If the parameters of the noise model are not tuned well, the noise added will be very small compared to your action range so the agent will not explore (which I suspect is what happends given that you do not tune the noise options in your code above).
Please take a look at this note in the doc. At a minimum, you should make sure that the variance of the noise model is between 1-10% of your action range. Then you can play with the variance decay rate. That should help you make some progress
  5 Comments
Emmanouil Tzorakoleftherakis
Setting the apprpriate noise params is a necessary step for a correct problem formulation - it does not guarantee succesful learning. If the agent actions during training make sense, i.e., if the agent is exploring values that make sense, the next thing to look at is your reward signal.
Maha Mosalam
Maha Mosalam on 1 Dec 2021
hello
If I had very small values of the action range may be between 0.001 and -0.001 , how I can choose exploration , it actulally the action donot change values during steps, any help for that?

Sign in to comment.

More Answers (0)

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!