Using MATLAB environment to train a PPO agent in Python

11 views (last 30 days)
Hello,
The folllowing article https://medium.com/analytics-vidhya/solving-openai-gym-environments-with-matlab-rl-toolbox-fb9d9e06b593 explains using Open AI gym environment to train agent in MATLAB. I am using a from scratch PPO implementation in python to train my agent using the MATLAB (R2022a) predefined environment. I have not yet come across any references for that.
Since I could not find any refrences, I was wondering whether it is possible? (I am currently working on it, however I am posing the question just so that I am sure I putting efforts in the right direction.)
Any leads on the references will be really helpful.

Answers (1)

Simar
Simar on 25 Jan 2024
Edited: Simar on 25 Jan 2024
Hi Ankita,
I understand that you are seeking assistance in training an agent in MATLAB environment using Proximal Policy Optimization (PPO) algorithm implemented from scratch in Python.
The medium article shared discusses on how to use MATLAB tools with OpenAI Gym. However, you intend to use a custom Python program to train an agent in a MATLAB environment which is opposite. Here are some steps and considerations to make this work:
  1. Interfacing Between Python and MATLAB: Calling Python functions from MATLAB using the “py” module. This allows to execute Python scripts and access variables and objects within MATLAB. Conversely, one can call MATLAB from Python using the “matlab.engine” module.
  2. Environment API: Ensure that the MATLAB environment adheres to a similar API as OpenAI Gym environments. This typically includes methods like “reset” for initializing the environment, step for advancing the simulation one step given an action, and properties like “observation_space” and “action_space” that define the possible states and actions.
  3. Data Conversion: When interfacing between MATLAB and Python, one needs to convert data types appropriately. MATLAB can automatically convert some Python data types to MATLAB types and vice versa but need to handle more complex conversions manually.
  4. Synchronization: Ensure that the state of the MATLAB environment is correctly synchronized with the Python-based PPO implementation. Each step in the environment should correspond to an action decided by the PPO algorithm, and the resulting state, reward, and done flag should be passed back to the PPO for processing.
  5. Performance Considerations: Be aware that there may be overhead associated with crossing the language boundary between Python and MATLAB. This could potentially slow down the training process.
Since this is a non-standard approach, one may not find ready-made references or examples. However, the task is technically feasible, and with careful planning and implementation, it can certainly work towards integrating Python-based PPO with a MATLAB environment. Be prepared for a significant amount of "glue" code to manage the interaction between the two languages and systems.
Hope it helps!
Best Regards,
Simar

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!