I am training a TD3 RL agent for pick and place robot. The reward function is, reward = exp(-E/d) where E is the total energy consumed where the trajectory is complete and d is the distance of the object from the end-effector. The training went smoothly while using DQN agent but it fails when DDPG, TD3 are used. What could be the reasion for this? I used the following code for agent creation.
obsInfo = rlNumericSpec([34 1]);
actInfo = rlNumericSpec([14 1], ...
LowerLimit=-1, ...
UpperLimit= 1);
env = rlFunctionEnv(obsInfo,actInfo,"KondoStepFunction","KondoResetFunction");
agent = rlTD3Agent(obsInfo,actInfo);