Hierarchical reinforcement learning (opens in new tab)
Wherein the problem of long horizons is addressed by decomposing tasks, and Internal RL is introduced whereby a meta‑controller is employed to manipulate model residuals sparsely, compressing token horizons.
Read the original article