Episode Q0 increases exponentially
21 views (last 30 days)
Show older comments
Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?
0 Comments
Answers (1)
Emmanouil Tzorakoleftherakis
on 16 Feb 2021
Hello,
Please take a look at this answer for some suggestions. Normalizing observations, rewards, and actions can also help avoid situations like these.
Hope this helps
See Also
Categories
Find more on Training and Simulation in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!