Even outside the thermal domain, you most likely need to start with a simulation model. RL does not need to build that model necessarily (e.g. model-free methods).
You can certainly use Reinforcement Learning Toolbox with Simulink and Simscape without issues, assuming you have a Simulinm model that simulates without errors outside of RL. You may still run into algebraic loops depending on the problem, but these do not prevent you from applying RL (quick fix is to try adding a delay).