RL agent does not learn properly

Question

0 votes

Hello together

I am trying to learn about the Reinforcement Learning Toolbox and want to control the speed of a DC motor using an RL agent to replace a PI controller. I have oriented myself to the example of the water tank. However, I am having some problems learning.

First, I have problems with the agent adjusting itself to arrive at either minimum (0rpm) or maximum(6000rpm) and then not changing its state, even though it was able to achieve a good reward in its own episodes before.

In my reward function, I have specified the error of the target and actual speed as a percentage. When I try to punish him in the reward function so that he doesn't stay at 0rpm anymore, he stays at 0rpm and doesn't try to explore the area. I also have trouble correcting the remaining control error.

In the following the code and some pictures

close all
%param
R = 7.03;
L = 1.04*10^-3;
J = 44.2*10^-7;
a = 2.45*10^-6;
Kn = 250*2*pi/60;
Km = 38.2*10^-3;
actInfo = rlNumericSpec([1 1],'LowerLimit', 0, 'UpperLimit', 24);
actInfo.Name = 'spannung';
obsInfo = rlNumericSpec([3 1],...
    'LowerLimit',[-inf -inf -inf  ]',...
    'UpperLimit',[ inf  inf inf]');
obsInfo.Name = 'observations';
obsInfo.Description = 'integrated error, error, and measured rpm';
env=rlSimulinkEnv("DCMotorRL2", 'DCMotorRL2/RL Agent',...
        obsInfo,actInfo);
env.ResetFcn = @(in)localResetFcn(in);
Ts = 0.1;   %agent sample time
Tf = 20;   %simulation time
rng(0)
statePath = [
    featureInputLayer(obsInfo.Dimension(1),Name="netObsIn")
    fullyConnectedLayer(50)
    reluLayer
    fullyConnectedLayer(25,Name="CriticStateFC2")];
actionPath = [
    featureInputLayer(actInfo.Dimension(1),Name="netActIn")
    fullyConnectedLayer(25,Name="CriticActionFC1")];
commonPath = [
    additionLayer(2,Name="add")
    reluLayer
    fullyConnectedLayer(1,Name="CriticOutput")];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork, ...
    "CriticStateFC2", ...
    "add/in1");
criticNetwork = connectLayers(criticNetwork, ...
    "CriticActionFC1", ...
    "add/in2");
criticNetwork = dlnetwork(criticNetwork);
figure
plot(criticNetwork)
critic = rlQValueFunction(criticNetwork,obsInfo,actInfo, ...
    ObservationInputNames="netObsIn", ...
    ActionInputNames="netActIn");
actorNetwork = [
    featureInputLayer(obsInfo.Dimension(1))
    fullyConnectedLayer(9)      %3
    tanhLayer
    fullyConnectedLayer(actInfo.Dimension(1))
    ];
actorNetwork = dlnetwork(actorNetwork);
actor = rlContinuousDeterministicActor(actorNetwork,obsInfo,actInfo);
agent = rlDDPGAgent(actor,critic);
agent.SampleTime = Ts;
agent.AgentOptions.TargetSmoothFactor = 1e-3;
agent.AgentOptions.DiscountFactor = 1.0;
agent.AgentOptions.MiniBatchSize = 64;
agent.AgentOptions.ExperienceBufferLength = 1e6; 
agent.AgentOptions.NoiseOptions.Variance = 0.8;             %0.3
agent.AgentOptions.NoiseOptions.VarianceDecayRate = 1e-5;   %-5
agent.AgentOptions.CriticOptimizerOptions.LearnRate = 1e-03;
agent.AgentOptions.CriticOptimizerOptions.GradientThreshold = 1;
agent.AgentOptions.ActorOptimizerOptions.LearnRate = 1e-04;
agent.AgentOptions.ActorOptimizerOptions.GradientThreshold = 1;
trainOpts = rlTrainingOptions(...
    MaxEpisodes=4000, ...
    MaxStepsPerEpisode=ceil(Tf/Ts), ...
    ScoreAveragingWindowLength=20, ...
    Verbose=false, ...
    Plots="training-progress",...
    StopTrainingCriteria="AverageReward",...
    StopTrainingValue=800,...
    SaveAgentCriteria="EpisodeCount", ...
    SaveAgentValue=600);
doTraining = true;
if doTraining
    % Train the agent.
    trainingStats = train(agent,env,trainOpts);
end
function in = localResetFcn(in)
% randomize reference signal
blk = sprintf('DCMotorRL2/omega_ref');
h=randi([2000,4000]);
in = setBlockParameter(in,blk,'Value',num2str(h));
%initial 1/min
%  h=randi([2000,4000])*(2*pi)/60;
%  blk = 'DCMotorRL2/DCMotor/Integrator1';
%  in = setBlockParameter(in,blk,'InitialCondition',num2str(h));
end

2 Comments
Show None Hide None

awcii on 17 Aug 2023

Did you solve your problme ?

Franz Schnyder on 17 Aug 2023

Yes, with the increase of the variance of the noise it got better. But in general I had to change many other settings like the observations and the neural networks for a more or less satisfying result. Finally I had the problem that the agent in the simulation oscillated slightly around the setpoint and this had too strong an influence on the real setup.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Emmanouil Tzorakoleftherakis on 20 Mar 2023

0 votes

Some comments:

1) 150 episodes is really not much, you need to let the training continue for a bit longer

2) There is no guarantee that the reward will always go up. It may go down as the agent explores and then it may be able to find a better policy along the way

3) Noise variance is critical with DDPG agent. Make sure this value is between 1-10% of your action range

4) Sample time of 0.1 seconds seems a bit too large for a motor control application

5) This example is doing FOC with RL, but you may be able to use it for general information:

https://www.mathworks.com/videos/reinforcement-learning-for-field-oriented-control-of-a-permanent-magnet-synchronous-motor-1587727861081.html

https://www.mathworks.com/help/mcb/gs/foc-of-pmsm-using-reinforcement-learning.html

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

RL agent does not learn properly

2 Comments
Show None Hide None

Accepted Answer

0 Comments
Show -2 older comments Hide -2 older comments

More Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

RL agent does not learn properly

2 Comments Show None Hide None

Accepted Answer

0 Comments Show -2 older comments Hide -2 older comments

More Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

2 Comments
Show None Hide None

0 Comments
Show -2 older comments Hide -2 older comments