beta distribution in PPO
    3 views (last 30 days)
  
       Show older comments
    
I want to confine the actions of my PPO algorithm and I was thinking whether or not I can implement beta distribution for my PPO algorithm to confine my action space somehow.
heres the script of networks i am using 
----------
commonPath = [ 
    featureInputLayer(prod(obsInfo.Dimension),Name="comPathIn")
    fullyConnectedLayer(120)
    tanhLayer
    fullyConnectedLayer(1,Name="comPathOut") 
    ];
% Define mean value path
meanPath = [
    fullyConnectedLayer(64,Name="meanPathIn")
    tanhLayer
    fullyConnectedLayer(64,Name="fc_2")
    tanhLayer
    fullyConnectedLayer(prod(actInfo.Dimension))
    leakyReluLayer(0.1,Name="meanPathOut")
    ];
% Define standard deviation path
sdevPath = [
    fullyConnectedLayer(64,"Name","stdPathIn")
    tanhLayer
    fullyConnectedLayer(64)
    tanhLayer
    fullyConnectedLayer(prod(actInfo.Dimension));
    softmaxLayer(Name="stdPathOut")
    ];
% Add layers to layerGraph object
actorNet = layerGraph(commonPath);
actorNet = addLayers(actorNet,meanPath);
actorNet = addLayers(actorNet,sdevPath);
% Connect paths
actorNet = connectLayers(actorNet,"comPathOut","meanPathIn/in");
actorNet = connectLayers(actorNet,"comPathOut","stdPathIn/in");
actorNetwork = dlnetwork(actorNet);
1 Comment
  Kautuk Raj
      
 on 15 Feb 2024
				To implement a Beta distribution for the action outputs in the PPO algorithm, I think we would need to modify the network architecture to output the parameters (alpha and beta) of the Beta distribution. These parameters must be positive, so one would typically use an activation function that ensures positivity, such as the softplus function.
Answers (0)
See Also
Categories
				Find more on Deep Learning Toolbox in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
