How can i create a tune PID Controller using Reinforcement Learning

Question

0 votes

Hello i try to imporve that example of Tune PI Controller using Reinforcement Learning into a Tune PID Controller using Reinforcement Learning:

https://it.mathworks.com/help/reinforcement-learning/ug/tune-pi-controller-using-td3.html

But it's not work, What was it that I did wrong?

I modified the simulink scheme watertankLQG.slx into a more simple schemecalled System_PID.slx and i create a script to calculate the parameter of PID (Kp-Ki-Kd-N) called ziegler_Nichols.m with techniques of Ziegler and Nichols

%% Esempio Z&N in anello chiuso
clear, close, clc
Ts=0.1;
Tf=30;
s = tf('s');
G= (5*(0.5-s))/(s+2)^3;
a= [1 6 12 8]
b = [-5 2.5]
[A,B,C,D] = tf2ss(b,a)
figure(1)
margin(G)
figure(2)
rlocus(G)
figure (3)
nyquistplot(G), grid on
%% Prova a ciclo chiuso
Kp = 1.9692308;
sysCL = feedback(Kp*G, 1)
figure(4)
margin(sysCL)
figure(5)
rlocus(sysCL)
figure(6)
step(sysCL);
% Osservazione: il diagramma di nyquist passa per il punto critico
% quando Kp = Ku
figure(7)
nyquist(Kp*G), grid on
KU = Kp;
%% Calcolo TU
[y,t] = step(sysCL,1:5e-3:10);
ii = find(abs(diff(y))<3e-5);
figure(8)
plot(t,y,'linewidth',2), hold on, grid on
plot(t(ii),y(ii),'or');
TU = min(diff(t(ii)))*2;
%% Taratura PID
Kp = 0.6*KU;
Ti = TU/2;
Td = TU/8;
Ki = Kp/Ti;
Kd = Kp*Td;
N = 10;
PI = Kp+(Ki/s)+((Kd*s)/(1+s*Td/N));
[y1,t1] = step(sysCL,0:1e-4:30);
sysCL2 = feedback(PI*G,1);
[y2,t2] = step(sysCL2,0:1e-4:30);
figure(9)
subplot(211)
plot(t1,y1,'linewidth',2);
title(['K_U = ' num2str(KU) ', T_U = ' num2str(TU)])
grid on
subplot(212)
plot(t2,y2,'linewidth',2);
title(['K_P = ' num2str(Kp) ', T_I = ' num2str(Ti) ', T_D = ' num2str(Td)])
grid on
figure(10)
margin(sysCL2)
figure(11)
bode(sysCL2)
figure(10)
rlocus(sysCL2)
figure(12)
nyquistplot(sysCL2)
Kp_Z_N=Kp
Ki_Z_N=Ki
Kd_Z_N=Kd
mdlTest = 'System_PID';
open_system(mdlTest);
set_param([mdlTest '/PID Controller'],'P',num2str(Kp_Z_N))
set_param([mdlTest '/PID Controller'],'I',num2str(Ki_Z_N))
set_param([mdlTest '/PID Controller'],'D',num2str(Kd_Z_N))
set_param([mdlTest '/PID Controller'],'N',num2str(N))

I copy and past the algorithm on web page: https://it.mathworks.com/help/reinforcement-learning/ug/tune-pi-controller-using-td3.html and and I made some changes.

I increased the columns of obsInfo = rlNumericSpec([3 1]) and i add these lines to take the parameter of Kd from observation, to define N=10 and to put these values into PID of System_PID

s = tf('s');
G= (5*(0.5-s))/(s+2)^3
a= [1 6 12 8]
b = [-5 2.5]
[A,B,C,D] = tf2ss(b,a)
mdl = 'rl_PID_Tune';
open_system(mdl)
Ts = 0.1;
Tf = 10;
[env,obsInfo,actInfo] = localCreatePIDEnv(mdl);
numObservations = obsInfo.Dimension(1);
numActions = prod(actInfo.Dimension);
rng(0)
initialGain = single([1e-3 2]);
actorNetwork = [
    featureInputLayer(numObservations,'Normalization','none','Name','state')
    fullyConnectedPILayer(initialGain, 'Action')];
actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
    'Observation',{'state'},'Action',{'Action'},actorOptions);
criticNetwork = localCreateCriticNetwork(numObservations,numActions);
criticOpts = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
critic1 = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
    'Observation','state','Action','action',criticOpts);
critic2 = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
    'Observation','state','Action','action',criticOpts);
critic = [critic1 critic2];
agentOpts = rlTD3AgentOptions(...
    'SampleTime',Ts,...
    'MiniBatchSize',128, ...
    'ExperienceBufferLength',1e6);
agentOpts.ExplorationModel.Variance = 0.1;
agentOpts.TargetPolicySmoothModel.Variance = 0.1;
agent = rlTD3Agent(actor,critic,agentOpts);
maxepisodes = 100;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes, ...
    'MaxStepsPerEpisode',maxsteps, ...
    'ScoreAveragingWindowLength',100, ...
    'Verbose',false, ...
    'Plots','training-progress',...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',-355);
% Train the agent.
trainingStats = train(agent,env,trainOpts);
simOpts = rlSimulationOptions('MaxSteps',maxsteps);
experiences = sim(env,agent,simOpts);
actor = getActor(agent);
parameters = getLearnableParameters(actor);
Ki = abs(parameters{1}(1))
Kp = abs(parameters{1}(2))
Kd = abs(parameters{1}(3))
N = 10; 
mdlTest = 'System_PID';
open_system(mdlTest);
set_param([mdlTest '/PID Controller'],'P',num2str(Kp))
set_param([mdlTest '/PID Controller'],'I',num2str(Ki))
set_param([mdlTest '/PID Controller'],'D',num2str(Kd_Z_N))
set_param([mdlTest '/PID Controller'],'N',num2str(N))
%% local Functions
function [env,obsInfo,actInfo] = localCreatePIDEnv(mdl)
% Define the observation specification obsInfo and action specification actInfo.
obsInfo = rlNumericSpec([3 1]);
obsInfo.Name = 'observations';
obsInfo.Description = 'integrated error and error';
actInfo = rlNumericSpec([1 1]);
actInfo.Name = 'PID output';
% Build the environment interface object.
env = rlSimulinkEnv(mdl,[mdl '/RL Agent'],obsInfo,actInfo);
% Set a cutom reset function that randomizes the reference values for the model.
env.ResetFcn = @(in)localResetFcn(in,mdl);
end
function in = localResetFcn(in,mdl)
% randomize reference signal
blk = sprintf([mdl '/Desired \nValue']);
hRef = 10 + 4*(rand-0.5);
in = setBlockParameter(in,blk,'Value',num2str(hRef));
% randomize initial height
hInit = 0;
blk = [mdl '/block system/System'];
in = setBlockParameter(in,blk,'InitialCondition',num2str(hInit));
end
function criticNetwork = localCreateCriticNetwork(numObservations,numActions)
statePath = [
    featureInputLayer(numObservations,'Normalization','none','Name','state')
    fullyConnectedLayer(32,'Name','fc1')];
actionPath = [
    featureInputLayer(numActions,'Normalization','none','Name','action')
    fullyConnectedLayer(32,'Name','fc2')];
commonPath = [
    concatenationLayer(1,2,'Name','concat')
    reluLayer('Name','reluBody1')
    fullyConnectedLayer(32,'Name','fcBody')
    reluLayer('Name','reluBody2')
    fullyConnectedLayer(1,'Name','qvalue')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'fc1','concat/in1');
criticNetwork = connectLayers(criticNetwork,'fc2','concat/in2');
end

I modifed the scheme of 'rlwatertankPIDTune' and i called 'rl_PID.m'

But it's not work these is the error messages that the comand Window return to me:

Error using dlnetwork/initialize (line 481)
Invalid network.
Error in dlnetwork (line 218)
                net = initialize(net, dlX{:});
Error in deep.internal.sdk.dag2dlnetwork (line 48)
    dlnet = dlnetwork(lg);
Error in rl.util.createInternalModelFactory (line 15)
                Model = deep.internal.sdk.dag2dlnetwork(Model);
Error in rlDeterministicActorRepresentation (line 86)
Model = rl.util.createInternalModelFactory(Model, Options, ObservationNames, ActionNames, InputSize, OutputSize);
Error in rl_PID (line 24)
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
Caused by:
    Layer 'Action': Error using 'predict' in layer fullyConnectedPILayer. The function threw an error and could not be executed.
        Error using dlarray/fullyconnect>iValidateWeights (line 221)
        The number of weights (2) for each output feature must match the number of elements (3) in each observation of the input data.
        Error in dlarray/fullyconnect (line 101)
        wdata = iValidateWeights(W, xsize, batchDims);
        Error in fullyConnectedPILayer/predict (line 21)
                    Z = fullyconnect(X, abs(obj.Weights), 0, 'DataFormat','CB');

What does it means?

4 Comments
Show 2 older comments Hide 2 older comments

S Saha on 25 Aug 2023

What is the basis for selecting values for the above Initial Gains for Reinforcement Learning for PID tuning??

Dario Di Francesco on 25 Aug 2023

Sorry guys but you response me a little bit late :D. I solve all these problem and i did it for bachelor thesis 2 years ago! If you are intersted about these work and other work about Control and machine learning technique about controll contact me on these email: difrancescodario95@gmail.com or see my linkedin profile https://www.linkedin.com/in/dario-di-francesco-89390a142/ and i pass you these work as soon as possibile after i traslate it in english. As soon as possibile i will start my website with all my project. See you later!

Sign in to comment.

Sign in to answer this question.

Follow Question

How can i create a tune PID Controller using Reinforcement Learning

4 Comments
Show 2 older comments Hide 2 older comments

Answers (0)

Categories

Tags

Community Treasure Hunt

How can i create a tune PID Controller using Reinforcement Learning

4 Comments Show 2 older comments Hide 2 older comments

Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

4 Comments
Show 2 older comments Hide 2 older comments