Generate Policy Block for Deployment

This example uses:

This example shows how to generate a Policy block ready for deployment from an agent object. You generate the policy from the agent in Train TD3 Agent for PMSM Control, then simulate it to validate its performance. The policy is simulated to validate performance. If Embedded Coder® is installed, a software-in-the-loop (SIL) simulation is run to validate the generated code of the policy.

In general, the workflow for the deployment of a reinforcement learning policy via a Simulink® model is:

Train the agent (see Train TD3 Agent for PMSM Control).
Generate a Policy block from the trained agent.
Replace the RL Agent block with the Policy block.
Configure the model for code generation.
Simulate the policy and verify policy performance.
Generate code for the policy, simulate the generated code, and verify policy performance.
Deploy to hardware for testing.

In this example, you do steps 2 through 6.

Load the motor parameters along with the trained TD3 agent.

sim_data;

### The Lq is observed to be lower than Ld. ### 
### Using the lower of these two for the Ld (internal variable)  ### 
### and higher of these two for the Lq (internal variable) for computations. ###
### The Lq is observed to be lower than Ld. ### 
### Using the lower of these two for the Ld (internal variable)  ### 
### and higher of these two for the Lq (internal variable) for computations. ###
             model: 'Maxon-645106'
                sn: '2295588'
                 p: 7
                Rs: 0.2930
                Ld: 8.7678e-05
                Lq: 7.7724e-05
                Ke: 5.7835
                 J: 8.3500e-05
                 B: 7.0095e-05
           I_rated: 7.2600
          QEPSlits: 4096
            N_base: 3476
             N_max: 4300
            FluxPM: 0.0046
           T_rated: 0.3471
    PositionOffset: 0.1650

               model: 'BoostXL-DRV8305'
                  sn: 'INV_XXXX'
                V_dc: 24
              I_trip: 10
              Rds_on: 0.0020
              Rshunt: 0.0070
       CtSensAOffset: 2295
       CtSensBOffset: 2286
       CtSensCOffset: 2295
             ADCGain: 1
         EnableLogic: 1
        invertingAmp: 1
          ISenseVref: 3.3000
    ISenseVoltPerAmp: 0.0700
           ISenseMax: 21.4286
             R_board: 0.0043
     CtSensOffsetMax: 2500
     CtSensOffsetMin: 1500

                 model: 'LAUNCHXL-F28379D'
                    sn: '123456'
         CPU_frequency: 200000000
         PWM_frequency: 5000
    PWM_Counter_Period: 20000
              ADC_Vref: 3
          ADC_MaxCount: 4095
         SCI_baud_rate: 12000000

    V_base: 13.8564
    I_base: 21.4286
    N_base: 3476
    T_base: 1.0249
    P_base: 445.3845

load("rlPMSMAgent.mat","agent");

Generate Policy Block

Open the Simulink model used for training the TD3 agent.

mdl_rl = "mcb_pmsm_foc_sim_RL";
open_system(mdl_rl);

Open the subsystem containing the RL Agent block.

agentblk = mdl_rl + ...
    "/Current Control/Control_System" + ... 
    "/Closed Loop Control/Reinforcement Learning/RL Agent";
open_system(get_param(agentblk,"Parent"));

To create a deployable model, you replace the RL Agent block with a Policy block.

Set the agent's UseExplorationPolicy property to false so the generated policy takes the greedy action at each time step. Generate the policy block using generatePolicyBlock and specify the name of the MAT-file containing the policy data for the block.

% To ensure that the generated policy is greedy, 
% set UseExplorationPolicy to false
agent.UseExplorationPolicy = false;

% Specify the MAT-file name for the policy data
fname = "PMSMPolicyBlockData.mat";

% Delete the file if it already exists
if isfile(fname)
    delete(fname);
end

% Generate the block and the policy data
generatePolicyBlock(agent,MATFileName=fname)

Alternatively, you can generate the policy block can be generated by clicking Generate greedy policy block from the block mask. Use open_system(agentblk) to open the RL Agent block mask, or simply double-click the block.

Simulate the Policy

For this example, the Policy block has already replaced the RL Agent block inside of the pmsm_current_control model. This model has been configured for code generation, and the Policy block loads the trained policy from PMSMPolicyBlockData.mat.

mdl_current_ctrl = "pmsm_current_control";
open_system(mdl_current_ctrl);

Obtain the path of the policy block in the Simulink model.

policyblk = mdl_current_ctrl + ...
    "/Current Control/Control_System" + ...
    "/Closed Loop Control/Reinforcement Learning/Policy";

Use open_system(policyblk) to open the Policy block mask, or simply double-click the block.

The model mcb_pmsm_foc_sim_policy references pmsm_current_control using a model reference block. Simulate the top-level model and plot the responses for the inner and outer control loops.

Setup the Simulation Data Inspector (SDI).

Simulink.sdi.clear;
Simulink.sdi.setSubPlotLayout(3,1);

Open the model and get the path to the Current Control block.

mdl_policy = "mcb_pmsm_foc_sim_policy";
open_system(mdl_policy);
current_ctrl_blk = mdl_policy + "/Current Control";

To temporarily change the model and easily run multiple simulations with such changes, Simulink.SimulationInput (Simulink) object.

in = Simulink.SimulationInput(mdl_policy);

Simulate the model with the current controller, run the simulation in normal mode

in = setBlockParameter(in,current_ctrl_blk, ...
    "SimulationMode","Normal");
out_sim = sim(in);

Get results from the latest SDI run.

runSim = Simulink.sdi.Run.getLatest;

Extract the outer control loop signals.

speedSim    = getSignalsByName(runSim,"Speed_fb" );
speedRefSim = getSignalsByName(runSim,"Speed_Ref");

Plot both signals.

plotOnSubPlot(speedSim   ,1,1,true);
plotOnSubPlot(speedRefSim,1,1,true);

Extract the inner control loop signals.

idSim    = getSignalsByName(runSim,"id" );
iqSim    = getSignalsByName(runSim,"iq" );
idRefSim = getSignalsByName(runSim,"id_ref");
iqRefSim = getSignalsByName(runSim,"iq_ref");

Plot the extracted signals.

plotOnSubPlot(idSim   ,2,1,true);
plotOnSubPlot(idRefSim,2,1,true);
plotOnSubPlot(iqSim   ,3,1,true);
plotOnSubPlot(iqRefSim,3,1,true);

Open the Simulation Data Inspector.

Simulink.sdi.view;

Validate Generated Code for Policy

If Embedded Coder is installed, the current controller model reference can be run in SIL mode. Running the current controller in SIL mode generates code for the current controller model, including the policy block.

The Simulink model is configured only on Windows®, so display a message and stop execution when attempting to run on a different operating system.

if ~ispc
    disp("The model ""pmsm_current_control"" is configured " + ...
         "for SIL simulations only on Windows system.")
    return
end

Simulate the model with the current controller.

Enable SIL mode.

in = setBlockParameter(in,current_ctrl_blk, ...
    "SimulationMode","Software-in-the-loop");

Simulate the model. Use evalc to capture the text output from code generation, for possible later inspection.

txt_out = evalc("out_sil = sim(in)");

Get results from the latest SDI run.

runSIL = Simulink.sdi.Run.getLatest;

Extract the speed response when run in SIL mode.

speedSIL = getSignalsByName(runSIL,"Speed_fb");

Compare the SIL response to the simulated response. The SIL response should be close to the response simulated in normal mode.

speedSim.AbsTol = 1e-3;
cr = Simulink.sdi.compareSignals(speedSim.ID,speedSIL.ID);

Display the largest signal difference.

cr.MaxDifference

ans = 5.0485e-05

Open the Simulation Data Inspector.

Simulink.sdi.view;

Once you are satisfied with the performance of the policy in simulation, you can use an appropriate target/hardware support package to deploy the policy to hardware.

Related Examples

Train Reinforcement Learning Agents

Generate Policy Block for Deployment

Generate Policy Block

Simulate the Policy

Validate Generated Code for Policy

See Also

Functions

Objects

Blocks

Related Examples

More About