DDPG does not converge

Question

0 votes

simulink.PNG

Hello

I am using a DDPG agent that generates 4 continuous actions (2 positive values- 2negative values). The summation of 2 positive action values must be equal to the positive part of a reference value, and the summation of 2 negative action values must be equal to the negative part of the reference value. However, the agent can't learn to track the reference. I have tried different reward functions and hyperparameters, but after a while it always chooses the maximum values of defined action ranges ([-1 -1 1 1]).

Any suggestion I appreciate

open_system(mdl)

obsInfo = rlNumericSpec([2 1]);

obsInfo.Name = 'observations';

numObservations = obsInfo.Dimension(1);

actInfo = rlNumericSpec([4 1],...

LowerLimit=[-1 -1 0 0]',...

UpperLimit=[0 0 1 1]');

numActions = actInfo.Dimension(1);

%Build the environment interface object

agentblk = 'MEMG_RL/RL Agent';

env = rlSimulinkEnv(mdl,agentblk,obsInfo,actInfo);

Ts = 2e-2;

Tf = 60;

statepath = [featureInputLayer(numObservations , Name = 'stateinp')

fullyConnectedLayer(96,Name = 'stateFC1')

reluLayer

fullyConnectedLayer(74,Name = 'stateFC2')

reluLayer

fullyConnectedLayer(36,Name = 'stateFC3')];

actionpath = [featureInputLayer(numActions, Name = 'actinp')

fullyConnectedLayer(72,Name = 'actFC1')

reluLayer

fullyConnectedLayer(36,Name = 'actFC2')];

commonpath = [additionLayer(2,Name = 'add')

fullyConnectedLayer(96,Name = 'FC1')

reluLayer

fullyConnectedLayer(72,Name = 'FC2')

reluLayer

fullyConnectedLayer(24,Name = 'FC3')

reluLayer

fullyConnectedLayer(1,Name = 'output')];

critic_network = layerGraph();

critic_network = addLayers(critic_network,actionpath);

critic_network = addLayers(critic_network,statepath);

critic_network = addLayers(critic_network,commonpath);

critic_network = connectLayers(critic_network,'actFC2','add/in1');

critic_network = connectLayers(critic_network,'stateFC3','add/in2');

plot(critic_network)

critic = dlnetwork(critic_network);

criticOptions = rlOptimizerOptions('LearnRate',3e-04,'GradientThreshold',1);

critic = rlQValueFunction(critic,obsInfo,actInfo,...

'ObservationInputNames','stateinp','ActionInputNames','actinp');

%% actor

actorNetwork = [featureInputLayer(numObservations,Name = 'observation')

fullyConnectedLayer(72,Name = 'actorFC1')

reluLayer

fullyConnectedLayer(48,Name='actorFc2')

reluLayer

fullyConnectedLayer(36,Name='actorFc3')

reluLayer

fullyConnectedLayer(numActions,Name='output')

tanhLayer

scalingLayer(Name = 'actorscaling',scale = max(actInfo.UpperLimit))];

actorNetwork = dlnetwork(actorNetwork);

actorOptions = rlOptimizerOptions('LearnRate',3e-04,'GradientThreshold',1);

actor = rlContinuousDeterministicActor(actorNetwork,obsInfo,actInfo);

%% agent

agentOptions = rlDDPGAgentOptions(...

'SampleTime',Ts,...

'ActorOptimizerOptions',actorOptions,...

'CriticOptimizerOptions',criticOptions,...

'ExperienceBufferLength',1e6,...

'MiniBatchSize',128);

agentOptions.NoiseOptions.StandardDeviation = 0.1; %.07/sqrt(Ts) ;

agentOptions.NoiseOptions.StandardDeviationDecayRate = 1e-6;

maxepisodes = 5000;

maxsteps = ceil(Tf/Ts);

trainOpts = rlTrainingOptions(...

'MaxEpisodes',maxepisodes, ...

'MaxStepsPerEpisode',maxsteps, ...

'ScoreAveragingWindowLength',20, ...

'Verbose',false, ...

'Plots','training-progress',...

'StopTrainingCriteria','EpisodeCount',...

'StopTrainingValue',5000);

agent = rlDDPGAgent(actor,critic,agentOptions);

2 Comments
Show None Hide None

Esan freedom on 20 May 2024

Captura de pantalla 2024-05-20 131632.png

@ Emmanouil Tzorakoleftherakis

It learns and get lost again as reward plot shows.

I aapreciate at once

DDPG does not converge

2 Comments
Show None Hide None

Answers (0)

Categories

Tags

Community Treasure Hunt

DDPG does not converge

2 Comments Show None Hide None

Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

2 Comments
Show None Hide None