Deep Q Learning to Control an arm model

Question

Leif Eric Goebel on 30 Nov 2019

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/494081-deep-q-learning-to-control-an-arm-model

Answered: Vimal Rathod on 13 Dec 2019

Hello,

I am new to the Deep Q Learning (or in general Deep neural network) content. I have a task to create a (optimal) control for a mathematical arm model that I implemented. I oriented my attempt at the "Cart Pole Environment" that comes with the Matlab Deep Learning Toolbox. Hence I have a step function, a reset function and I use them to create my environment. Thats is working fine. My actions are cell arrays (correctly initalized) to create the control of the biceps and triceps in all combinations of two arrays.

In short my system or better step function gets the inputs:

α (angle of the arm measured from fully streched

to fully bent

,

(velocity of the arm while in motion) and

(difference between the current angle and the desired angle).

Now my reset function sets the inital position of the arm is randomly given with initial velocity 0 and the random (but then fixed) desired position.

I set my networks up as follows:

ObservationInfo = rlNumericSpec([3 1]);
ObservationInfo.Name = 'Arm States';
ObservationInfo.Description = 'alpha, dalpha, DiffState';
a = P.Param.a;
b = P.Param.b;
[A,B]   = meshgrid(a,b);
actions = reshape(cat(2,A',B'),[],2);
ActionInfo = rlFiniteSetSpec(num2cell(actions,2));
ActionInfo.Name = 'actions';
env = rlFunctionEnv(ObservationInfo,ActionInfo,'simulateStep','resetFunction');
rng(0);
InitialObs = reset(env);
hiddenLayerSize = 128;
statePath = [
    imageInputLayer([3 1 1],'Normalization','none','Name','state')
    tanhLayer('Name','CriticRelu1')
    fullyConnectedLayer(hiddenLayerSize,'Name','CriticStateFC1')
    tanhLayer('Name','CriticRelu2')
    fullyConnectedLayer(hiddenLayerSize,'Name','CriticStateFC2')];
actionPath = [
    imageInputLayer([1 2 1],'Normalization','none','Name','action')
    fullyConnectedLayer(hiddenLayerSize,'Name','CriticActionFC1')
    tanhLayer('Name','tanh1')
    fullyConnectedLayer(hiddenLayerSize,'Name','CriticActionFC2')
    reluLayer('Name','ActionRelu1')
    fullyConnectedLayer(hiddenLayerSize,'Name','CriticActionFC3')
    fullyConnectedLayer(hiddenLayerSize,'Name','CriticActionFC4')];
commonPath = [
    additionLayer(2,'Name','add')
    reluLayer('Name','CriticCommonRelu')
    fullyConnectedLayer(1,'Name','output')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = addLayers(criticNetwork, commonPath);    
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC4','add/in2');

So this is very much copied from the given example. However my agent takes absolute ages to learn and at the end the given control does not work at all. It simply does not learn anything (like "minimizing" the distance from current angle to desired angle).

My reward function is basically the following:

Reward = -(P.Param.finish(1) - P.State(1))^2 - 0.1*(P.Param.finish(2) - P.State(2))^2 - sum(sum(Action));
Reward = Reward *~IsDone - tooFar* 1000 + stateCorrect*100;

And the variable tooFar is set to 1 if the state either exceeds

to the negative or

to positive and stateCorrect is 1 if the angle α is within 1 degree around

.

In the end I have the following questions:

1) Is there a better way to set up my Network to get the result I want?

2) Is the Reward function "functional"? Or should I use something that is more like "get +1 if the distance is smaller than the step before"?

Thank you very much in advance and I hope the question is detailed enough to be answered. If there are any other questions about my question let me know.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Vimal Rathod on 13 Dec 2019

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/494081-deep-q-learning-to-control-an-arm-model#answer_406198

Your network seems fine and I am hoping you have set your hyper parameters properly. Coming to the rewards, when you start training Deep Q learning network, it is initially suggested to use "Discrete Rewards" function to push the agent twords the desired outputs. You could do that by putting a smaller reward and a higher penality if the arm is not in a desired position or angle. In later stages of training, you can change the rewards to continuous one. Generally it is preferred to use continoius rewards to fine tune network performance.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Deep Q Learning to Control an arm model

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Deep Q Learning to Control an arm model

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments