- Critic instability: Large Q-values (e.g., 0 - 70) indicate divergence.Reduce learning rates (Actor: 1e-4, Critic: 1e-3) and target smoothing factor τ ≈ 0.005.
- Unscaled rewards/observations: Very large negative rewards destabilize training. Normalize inputs/outputs or scale rewards - see Normalize Data in RL Agents.
- Exploration loss: Keep non-zero Gaussian noise; avoid early decay.
- Monitor critic loss and Q-values: Use the Episode Manager to confirm when instability starts.
“TD3 performs well, then abruptly saturates at bad actions (Simulink RL) — why?”
19 views (last 30 days)
Show older comments
Hello everyone,
I’m training a controller with TD3 because I want a deterministic policy. It performs well at first, but then it suddenly gets stuck in very poor actions with no warning. This seems odd—greedy policies shouldn’t behave like this(stuck a fixed trace, even it earned a very poor reward) . I’ve tried different exploration settings and input normalization, but the problem keeps returning.
Has anyone seen this on the MathWorks platform? What could cause TD3 to collapse like this, and how can I prevent it?
Episode: 1/200 | Episode reward: -150144.31 | Episode steps: 250 | Average reward: -150144.31 | Step Count: 250 | Episode Q0: -0.06
Episode: 2/200 | Episode reward: -142781.27 | Episode steps: 250 | Average reward: -146462.79 | Step Count: 500 | Episode Q0: -0.06
Episode: 3/200 | Episode reward: -146085.77 | Episode steps: 250 | Average reward: -146337.12 | Step Count: 750 | Episode Q0: -0.07
Episode: 4/200 | Episode reward: -127403.65 | Episode steps: 250 | Average reward: -141603.75 | Step Count: 1000 | Episode Q0: -0.14
Episode: 5/200 | Episode reward: -90967.66 | Episode steps: 250 | Average reward: -131476.53 | Step Count: 1250 | Episode Q0: -0.24
Episode: 6/200 | Episode reward: -58398.87 | Episode steps: 250 | Average reward: -119296.92 | Step Count: 1500 | Episode Q0: -0.28
Episode: 7/200 | Episode reward: -35903.63 | Episode steps: 250 | Average reward: -107383.59 | Step Count: 1750 | Episode Q0: -0.28
Episode: 8/200 | Episode reward: -10701.52 | Episode steps: 250 | Average reward: -95298.34 | Step Count: 2000 | Episode Q0: -0.06
Episode: 9/200 | Episode reward: -9437.55 | Episode steps: 250 | Average reward: -85758.25 | Step Count: 2250 | Episode Q0: 0.89
Episode: 10/200 | Episode reward: -17715.87 | Episode steps: 250 | Average reward: -78954.01 | Step Count: 2500 | Episode Q0: 2.43
Episode: 11/200 | Episode reward: -34624.05 | Episode steps: 250 | Average reward: -67401.98 | Step Count: 2750 | Episode Q0: 4.24
Episode: 12/200 | Episode reward: -40353.72 | Episode steps: 250 | Average reward: -57159.23 | Step Count: 3000 | Episode Q0: 6.50
Episode: 13/200 | Episode reward: -42417.75 | Episode steps: 250 | Average reward: -46792.43 | Step Count: 3250 | Episode Q0: 7.25
Episode: 14/200 | Episode reward: -43329.38 | Episode steps: 250 | Average reward: -38385.00 | Step Count: 3500 | Episode Q0: 8.65
Episode: 15/200 | Episode reward: -21137.36 | Episode steps: 250 | Average reward: -31401.97 | Step Count: 3750 | Episode Q0: 10.46
Episode: 16/200 | Episode reward: -20629.98 | Episode steps: 250 | Average reward: -27625.08 | Step Count: 4000 | Episode Q0: 12.07
Episode: 17/200 | Episode reward: -190383.39 | Episode steps: 250 | Average reward: -43073.06 | Step Count: 4250 | Episode Q0: 15.93
Episode: 18/200 | Episode reward: -188099.18 | Episode steps: 250 | Average reward: -60812.82 | Step Count: 4500 | Episode Q0: 16.85
Episode: 19/200 | Episode reward: -188479.67 | Episode steps: 250 | Average reward: -78717.04 | Step Count: 4750 | Episode Q0: 16.95
Episode: 20/200 | Episode reward: -189525.03 | Episode steps: 250 | Average reward: -95897.95 | Step Count: 5000 | Episode Q0: 18.68
Episode: 21/200 | Episode reward: -189286.51 | Episode steps: 250 | Average reward: -111364.20 | Step Count: 5250 | Episode Q0: 18.17
Episode: 22/200 | Episode reward: -190229.29 | Episode steps: 250 | Average reward: -126351.75 | Step Count: 5500 | Episode Q0: 19.29
Episode: 23/200 | Episode reward: -188722.45 | Episode steps: 250 | Average reward: -140982.22 | Step Count: 5750 | Episode Q0: 20.18
Episode: 24/200 | Episode reward: -189155.54 | Episode steps: 250 | Average reward: -155564.84 | Step Count: 6000 | Episode Q0: 21.27
Episode: 25/200 | Episode reward: -187477.81 | Episode steps: 250 | Average reward: -172198.88 | Step Count: 6250 | Episode Q0: 21.23
Episode: 26/200 | Episode reward: -187086.00 | Episode steps: 250 | Average reward: -188844.49 | Step Count: 6500 | Episode Q0: 23.44
Episode: 27/200 | Episode reward: -187086.00 | Episode steps: 250 | Average reward: -188514.75 | Step Count: 6750 | Episode Q0: 25.41
Episode: 28/200 | Episode reward: -187086.00 | Episode steps: 250 | Average reward: -188413.43 | Step Count: 7000 | Episode Q0: 33.53
Episode: 29/200 | Episode reward: -187086.00 | Episode steps: 250 | Average reward: -188274.06 | Step Count: 7250 | Episode Q0: 43.19
Episode: 30/200 | Episode reward: -187086.00 | Episode steps: 250 | Average reward: -188030.16 | Step Count: 7500 | Episode Q0: 48.98
Episode: 31/200 | Episode reward: -187086.00 | Episode steps: 250 | Average reward: -187810.11 | Step Count: 7750 | Episode Q0: 69.61
0 Comments
Answers (1)
Satyam
on 17 Oct 2025 at 6:54
This TD3 “collapse” usually happens due to critic divergence or loss of exploration. You can try troubleshooting by following these few steps:
I hope it will fix your issue.
0 Comments
See Also
Categories
Find more on Training and Simulation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!