I am working on path planning and obstacle avoidance using deep reinforcement learning but training is not converging.

Question

Faraz Ahmad on 24 Mar 2022

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/1679199-i-am-working-on-path-planning-and-obstacle-avoidance-using-deep-reinforcement-learning-but-training

Edited: Matteo D'Ambrosio on 28 May 2023

check.png

Following is the code for creating rl Agent:

criticOpts = rlRepresentationOptions("LearnRate",1e-3,"L2RegularizationFactor",1e-4,"GradientThreshold",1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,"Observation",{'State'},"Action",{'Action'},criticOpts);
actorOptions = rlRepresentationOptions("LearnRate",1e-4,"L2RegularizationFactor",1e-4,"GradientThreshold",1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,"Observation",{'State'},"Action",{'Action'},actorOptions);
agentOpts = rlDDPGAgentOptions(...
    "SampleTime",sampleTime,...
    "TargetSmoothFactor",1e-3,...
    "DiscountFactor",0.995, ...
    "MiniBatchSize",128, ...
    "ExperienceBufferLength",1e6); 
agentOpts.NoiseOptions.Variance = 0.1;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
obstacleAvoidanceAgent = rlDDPGAgent(actor,critic,agentOpts);

Training options are:

maxEpisodes = 5000;
maxSteps = ceil(Tfinal/sampleTime);
trainOpts = rlTrainingOptions(...
    "MaxEpisodes",maxEpisodes, ...
    "MaxStepsPerEpisode",maxSteps, ...
    "ScoreAveragingWindowLength",50, ...    "StopTrainingCriteria","AverageReward", ...
    "StopTrainingValue",10000, ...
    "Verbose", true, ...
    "Plots","training-progress");
trainingStats = train(obstacleAvoidanceAgent,env,trainOpts);

and for training, it is not converging as shown in the attached fig:

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Matteo D'Ambrosio on 28 May 2023

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1679199-i-am-working-on-path-planning-and-obstacle-avoidance-using-deep-reinforcement-learning-but-training#answer_1246184

Edited: Matteo D'Ambrosio on 28 May 2023

I'm not too familiar with DDPG as i use other agents, but by looking at your episode reward figure a few things come to mind:

Try decreasing the sparsity in your episode reward. You have some episodes with 0 reward and some with 10k reward which can generate some problems with gradients. Maybe add a multiplier to the rewards you are giving so that your high-reward episodes reach a reward of ~10, but play around with it.
Decrease learning rate, which always helps when you start a new RL project. At least until you find a number that works. Maybe try something like 1e-4, 1e-5, 1e-6, i wouldn't go lower.

Hope this helps.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

I am working on path planning and obstacle avoidance using deep reinforcement learning but training is not converging.

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

I am working on path planning and obstacle avoidance using deep reinforcement learning but training is not converging.

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments