Unclear RL reward scheme
1 view (last 30 days)
I'm looking at the reward scheme in this example https://nl.mathworks.com/help/reinforcement-learning/ug/train-ddpg-agent-for-adaptive-cruise-control.html . I don't quite understand the role of the in the reward scheme. The agent gets a negative reward proportional to the control signal from the previous time step? Not sure what that means.
Anyone able to clarify? Thank you!