How do I Reinforcement Learning Parameters for a Water Tank System with Second-Order Dynamics Using TD3 Agents

4 views (last 30 days)
I am trying adjust reinforcement learning (RL) parameters for the Generate Reward Function from a Model Verification Block for a water tank system that represents the second-order transfer function , and using Twin Delayed Deep Deterministic Policy Gradient (TD3) Agents, you need to consider a few key aspects that influence RL performance. Since you’ve already adjusted weights, reward function methods, and the learning rate but are still facing issues, let’s walk through a structured approach to fine-tuning the model and reinforcement learning parameters.
  4 Comments
Sam Chak
Sam Chak on 14 Oct 2024
It is unclear what you are instructing the TD3 control agent to do. Although the reinforcement learning controller functions similarly to "Aladdin's magic lamp," you still need to make your wishes clearly and explicitly.
The following is an optimal PID controller that drives the plant to settle exactly at 20 seconds without overshoot. It is likely that no other controllers can perform better than this one (perhaps only on par with it) because the performance objectives are clearly defined.
Did you instruct the TD3 agents to search for the optimal PID gain values based on structured observations of error, integral error, and derivative error, or did you allow them to explore freely and produce random control actions?
s = tf('s');
% Plant
Gp = 1/(24.4*s^2 + 12.2*s + 1)
Gp = 1 --------------------- 24.4 s^2 + 12.2 s + 1 Continuous-time transfer function.
stepinfo(Gp)
ans = struct with fields:
RiseTime: 22.4665 TransientTime: 40.7858 SettlingTime: 40.7858 SettlingMin: 0.9011 SettlingMax: 0.9999 Overshoot: 0 Undershoot: 0 Peak: 0.9999 PeakTime: 90.3192
% PID controller
kp = 1.5293491321269;
ki = 0.145848289518598;
kd = 0.937225744461437;
Tf = 1.71410992083058;
Gc = pid(kp, ki, kd, Tf)
Gc = 1 s Kp + Ki * --- + Kd * -------- s Tf*s+1 with Kp = 1.53, Ki = 0.146, Kd = 0.937, Tf = 1.71 Continuous-time PIDF controller in parallel form.
% Closed-loop system
Gcl = feedback(Gc*Gp, 1)
Gcl = 2.076 s^2 + 1.038 s + 0.08509 ---------------------------------------------------- 24.4 s^4 + 26.43 s^3 + 10.19 s^2 + 1.621 s + 0.08509 Continuous-time transfer function.
S = stepinfo(Gcl)
S = struct with fields:
RiseTime: 11.5131 TransientTime: 20.0006 SettlingTime: 20.0006 SettlingMin: 0.9019 SettlingMax: 0.9999 Overshoot: 0 Undershoot: 0 Peak: 0.9999 PeakTime: 40.8897
% Plot results
step(Gp ), hold on
step(Gcl), grid on
ylim([0, 1.2])
legend('Plant response', 'Closed-loop system', 'location', 'east')

Sign in to comment.

Answers (1)

Oghenekome
Oghenekome on 17 Oct 2024
Edited: Oghenekome on 17 Oct 2024
I set it up to allow them to explore freely and produce random control actions? and some yesterday it had found control solution for the 2nd order function but i change my blocks to use a limited integrators. now i am trying to implement and test the trained agent on a physical system.

Products


Release

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!