Abstract
Despite their groundbreaking performance, autonomous agents can misbehave when training and environmental conditions become inconsistent, with minor mismatches leading to undesirable behaviors or even catastrophic failures. Robustness towards these training-environment ambiguities is a core requirement for intelligent agents and its fulfillment is a long-standing challenge towards their real-world deployments. Here, we introduce a Distributionally Robust Free Energy model (DR-FREE) that instills this core property by design. Combining a robust extension of the free energy principle with a resolution engine, DR-FREE wires robustness into the agent decision-making mechanisms. Across benchmark experiments, DR-FREE enables the agents to complete the task even when, in contra…
Abstract
Despite their groundbreaking performance, autonomous agents can misbehave when training and environmental conditions become inconsistent, with minor mismatches leading to undesirable behaviors or even catastrophic failures. Robustness towards these training-environment ambiguities is a core requirement for intelligent agents and its fulfillment is a long-standing challenge towards their real-world deployments. Here, we introduce a Distributionally Robust Free Energy model (DR-FREE) that instills this core property by design. Combining a robust extension of the free energy principle with a resolution engine, DR-FREE wires robustness into the agent decision-making mechanisms. Across benchmark experiments, DR-FREE enables the agents to complete the task even when, in contrast, state-of-the-art models fail. This milestone may inspire both deployments in multi-agent settings and, at a perhaps deeper level, the quest for an explanation of how natural agents – with little or no training – survive in capricious environments.
Data availability
All (other) data needed to evaluate the conclusions in the paper and to replicate the experiments are available in the paper, Supplementary Information and accompanying code (see the Assets folder in Code Availability). Supplementary Movie 1 provides a recording of the robot experiments from the Robotarium platform. Recording and the figures of this paper also available at the folder Assets of our repository63.
Code availability
Pseudocode for DR-FREE is provided in the Supplementary Information. The full code for DR-FREE to replicate all the experiments is provided at our repository63. The folder Experiments contains our DR-FREE implementation for the Robotarium experiments. The folder also contains: (i) the code for the ambiguity-unaware free energy minimizing agent; (ii) the data shown in Fig. SI-8, together with the GP models and the code to train the models; (iii) the code to replicate the results in Fig. 3f and e. The folder also contains the code for the experiments in Supplementary Fig. 4 and provides the instructions to replicate the experiments of Supplementary Figs. 5 and 6. The folder Belief Update Benchmark contains the code to replicate our benchmarks for the belief updating results. The folder Assets contains all the figures of this paper, the data from the experiments used to generate these figures, and Supplementary Movie 1 from which the screen-shots of Fig. 3d were taken. The folder MaxDiff Benchmark contains the code to replicate our MaxDiff benchmark experiments. We build upon the original code-base from the MaxDiff paper21, integrating it in the Robotarium Python environment. The sub-folder Ant Benchmark contains the code for the Ant experiments. MaxDiff and NN-MPPI implementations are from the literature21. As highlighted in the Discussion, extending our analytical results to consider ambiguity inherently in the reward is an open theoretical research direction, interesting per se. Nevertheless, we now discuss how DR-FREE can be adapted to this setting. To achieve this, quoting form the literature31, one could define a modified problem formulation where the reward is appended to the observations available to the agent. In this setting, the reward becomes—quoting the literature31—the last coordinate of the observation, so that it can be embedded into model ambiguity.
References
Wurman, P. R. et al. Outracing champion Gran Turismo drivers with deep reinforcement learning. Nature 602, 223–228 (2022).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Degrave, J. et al. Magnetic control of Tokamak plasmas through deep reinforcement learning. Nature 602, 414–419 (2022).
Kaufmann, E. et al. Champion-level drone racing using deep reinforcement learning. Nature 620, 982–987 (2023).
Lake, B. M., Ullman, T. D., Tenenbaum, J. B. & Gershman, S. J. Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2016).
Vanchurin, V., Wolf, Y. I., Katsnelson, M. I. & Koonin, E. V. Toward a theory of evolution as multilevel learning. Proc. Natl. Acad. Sci. USA 119, e2120037119 (2022). 1.
Manrique, J. M., Friston, K. J. & Walker, M. J. ‘snakes and ladders’ in paleoanthropology: from cognitive surprise to skillfulness a million years ago. Phys. Life Rev. 49, 40–70 (2024).
Kejriwal, M., Kildebeck, E., Steininger, R. & Shrivastava, A. Challenges, evaluation and opportunities for open-world learning. Nat. Mach. Intell. 6, 580–588 (2024).
McAllister, R. D. & Esfahani, P. M. Distributionally robust model predictive control: closed-loop guarantees and scalable algorithms. IEEE Trans. Autom. Control 70, 2963–2978 (2025).
Moos, J. et al. Robust reinforcement learning: a review of foundations and recent advances. Mach. Learn. Knowl. Extract. 4, 276–315 (2022).
Taskesen, B., Iancu, D., Koçyiğit, C. & Kuhn, D. Distributionally robust linear quadratic control. In Advances in Neural Information Processing Systems, Vol. 36 (eds Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M. & Levine, S.) (Curran Associates Inc., 2023). 1.
Rocher, L., Tournier, A. J. & de Montjoye, J.-A. Adversarial competition and collusion in algorithmic markets. Nat. Mach. Intell. 5, 497–504 (2023).
West, M. T. et al. Towards quantum enhanced adversarial robustness in machine learning. Nat. Mach. Intell. 5, 581–589 (2023).
Hinton, G. E. & Zemel, R. S. Autoencoders, minimum description length and Helmholtz free energy. In Advances in Neural Information Processing Systems, Vol. 6 (eds Cowan, J. Tesauro, G. & J. Alspector, J.) (Morgan-Kaufmann, 1993). 1.
Hinton, G. E., Dayan, P., Neal, R. M. & Zemel, R. S. The Helmholtz machine. Neural Comput. 7, 889–904 (1995).
Jose, S. T. & Simeone, O. Free energy minimization: a unified framework for modeling, inference, learning, and optimization [lecture notes]. IEEE Signal Process. Mag. 38, 120–125 (2021).
Hibat-Allah, M., Inack, E. M., Wiersema, R., Melko, R. G. & Carrasquilla, J. Variational neural annealing. Nat. Mach. Intell. 3, 952–961 (2021).
Parr, T., Pezzulo, G. & Friston, K. J. Active Inference: The Free Energy Principle in Mind, Brain, and Behavior (The MIT Press, 2022). 1.
Jaiswal, P., Honnappa, H. & Rao, V. A. On the statistical consistency of risk-sensitive Bayesian decision-making. In Proc. 37th Internat**Curran Associates IncCurran Associates Incional Conference on Neural Information Processing Systems, NIPS (Curran Associates Inc., 2024). 1.
Sanger, T. D. Risk-aware control. Neural Comput. 26, 2669–2691 (2014).
Berrueta, T. A., Pinosky, A. & Murphey, T. D. Maximum diffusion reinforcement learning. Nat. Mach. Intell. 6, 504–514 (2024).
Mazzaglia, P., Verbelen, T. & Dhoedt, B. Contrastive active inference. In Advances in Neural Information Processing Systems, volume 34, (eds Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P S. & Wortman Vaughan, J.) (Curran Associates, Inc., 2021). 1.
Friston, K. J. The free-energy principle: a rough guide to the brain? Trends Cogn. Sci. 13, 293–301 (2009).
Heins, C. et al. Collective behavior from surprise minimization. Proc. Natl. Acad. Sci. 121, e2320239121 (2024). 1.
Prescott, T. J. & Wilson, S. P. Understanding brain functional architecture through robotics. Sci. Robot. 8, eadg6014 (2023). 1.
Hohwy, J. The Predictive Mind (Oxford University Press, 2013). 1.
Friston, K. J. et al. The free energy principle made simpler but not too simple. Phys. Rep. 1024, 1–29 (2023).
Gottwald, S. & Braun, D. A. The two kinds of free energy and the Bayesian revolution. PLoS Comput. Biol. 16, e1008420 (2020).
Imohiosen, A., Watson, J. & Peters, J. Active Inference or Control as Inference? A Unifying View 12–19 (Springer International Publishing, 2020). 1.
Berrueta, T. Robot Thermodynamics. Ph.d. dissertation, Nortwestern University, December https://www.proquest.com/openview/faffd739b9b7a1becbd5e99b0fbd83fe/ (2024). 1.
Eysenbach, B. & Levine, S. Maximum entropy RL (provably) solves some robust RL problems. In Proc. International Conference on Learning Representations (2022). 1.
Garrabé, E., Jesawada, H., Del Vecchio, C. & Russo, G. On convex data-driven inverse optimal control for nonlinear, non-stationary and stochastic systems. Automatica 173, 112015 (2025).
Friston, K. J. et al. Active inference and epistemic value. Cogn. Neurosci. 6, 187–214 (2015).
Parr, T. & Friston, K. J. Uncertainty, epistemics and active inference. J. R. Soc. Interface 14, 20170376 (2017).
Konaka, Y. & Naoki, H. Decoding reward-curiosity conflict in decision-making from irrational behaviors. Nat. Comput. Sci. 3, 418–432 (2023).
Khalvati, K. et al. Modeling other minds: Bayesian inference explains human choices in group decision-making. Sci. Adv. 5, eaax8783 (2019). 1.
Maselli, A., Lanillos, P. & Pezzulo, G. Active inference unifies intentional and conflict-resolution imperatives of motor control. PLoS Comput. Biol. 18, e1010095 (2022).
Vincent, J. F. B., Bogatyreva, O. A., Bogatyrev, N. R., Bowyer, A. & Pahl, A. K. Biomimetics: its practice and theory. J. R. Soc. Interface 3, 471–482 (2006).
Pezzulo, G., Rigoli, F. & Friston, K. J. Active inference, homeostatic regulation and adaptive behavioural control. Prog. Neurobiol. 134, 17–35 (2015).
Parr, T. & Friston, K. J. Generalised free energy and active inference. Biol. Cybern. 113, 495–513 (2019).
Broek, B. V. D., Wiegerinck, W. & Kappen, H. J. Risk sensitive path integral control. In Proc. Conference on Uncertainty in Artificial Intelligence (AUAI Press, 2010). 1.
Attias, H. Planning by probabilistic inference. In Proc. Ninth International Workshop on Artificial Intelligence and Statistics, volume R4 of Proceedings of Machine Learning Research (eds Christopher M. B. & Brendan J. F.) 9–16 (PMLR, 2021). 1.
Botvinick, M. & Toussaint, M. Planning as inference. Trends Cogn. Sci. 16, 485–488 (2012).
Wilson, S. et al. The robotarium: globally impactful opportunities, challenges, and lessons learned in remote-access, distributed control of multirobot systems. IEEE Control Syst. Mag. 40, 26–44 (2020).
Garrabé, E. & Russo, G. Probabilistic design of optimal sequential decision-making algorithms in learning and control. Annu. Rev. Control 54, 81–102 (2022).
Ziebart, B. D., Maas, A., Bagnell, J. A. & Dey, A. K. Maximum Entropy Inverse Reinforcement Learning. In Proc. 23rd National Conference on Artificial Intelligence 1433–1438 (AAAI Press, 2008). 1.
Zhou, Z., Bloem, M. & Bambos, N. Infinite time horizon maximum causal entropy inverse reinforcement learning. IEEE Trans. Autom. Control 63, 2787–2802 (2018).
Todorov, E., Erez, T. & Tassa, Y. MuJoCo: A physics engine for model-based control. In Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, 5026–5033 (IEEE, 2012). 1.
Williams, G. et al. Information theoretic MPC for model-based reinforcement learning. In Proc. IEEE International Conference on Robotics and Automation 1714–1721 (IEEE, 2017). 1.
Collins, K. M. et al. Building machines that learn and think with people. Nat. Hum. Behav. 8, 1851–1863 (2024).
Alirezaeizanjani, Z., Grossmann, R., Pfeifer, V., Hintsche, M. & Beta, C. Chemotaxis strategies of bacteria with multiple run modes. Sci. Adv. 6, eaaz6153 (2020). 1.
Tschantz, A., Seth, A. K. & Buckley, C. L. Learning action-oriented models through active inference. PLoS Comput. Biol. 16, e1007805 (2020).
Taleb, N. N. Antifragile: Things that Gain from Disorder (Random House, 2012). 1.
Barto, A., Mirolli, M. & Baldassarre, G. Novelty or surprise? Front. Psychol. 4, 907 (2013).
Hsu, M., Bhatt, M., Adolphs, R., Tranel, D. & Camerer, C. F. Neural systems responding to degrees of uncertainty in human decision-making. Science 310, 1680–1683 (2005).
Zak, P. J. Neuroeconomics. Philos. Trans. R. Soc. B Biol. Sci. 359, 1737–1748 (2004).
Jaynes, E. T. The minimum entropy production principle. Annu. Rev. Phys. Chem. 31, 579–601 (1980).
Haken, H. & Portugali, J. Relationships. Bayes, Friston, Jaynes and Synergetics 2nd Foundation, 85–104 (Springer International Publishing, 2021). 1.
Todorov, E. Efficient computation of optimal actions. Proc. Natl. Acad. Sci. USA 106, 11478–11483 (2009).
Murphy, K. P. Probabilistic Machine Learning: Advanced Topics (MIT Press, 2023). 1.
Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016). 1.
Dvijotham, D. & Todorov, E. Inverse optimal control with Linearly-Solvable MDPs. In Proc. 27th International Conference on Machine Learning 335–342 (Omnipress, 2010). 1.
Jesawada, H. DR-FREE repository, Zenodo, https://doi.org/10.5281/zenodo.17638771, (2025).
Acknowledgements
AH and HJ did this work while at University of Salerno. HJ and GR supported by the European Union-Next Generation EU Mission 4 Component 1 CUP E53D23014640001. KF supported by funding from the Wellcome Trust (Ref: 226793/Z/22/Z). AS supported by MOST—Sustainable Mobility National Research Center and received funding from the European Union Next-GenerationEU (PIANO NAZIONALE DI RIPRESA E RESILIENZA (PNRR)-MISSIONE 4 COMPONENTE 2, INVESTIMENTO 1.4-D.D. 1033 17/06/2022) under Grant CN00000023. This document reflects only the authors’ views and opinions. We acknowledge the use of ChatGPT for assistance in improving the wording and grammar of this document. GR and HJ wish to thank Prof. Del Vecchio (Sannio University), who allowed HJ to perform early preliminary experiments before him joining University of Salerno. GR thanks Prof. Francesco Bullo (University College Santa Barbara, USA), Prof. Michael Richardson (Macquarie University, Australia), and Prof. Mirco Musolesi (University College London, UK) for the insightful discussions and comments on an early version of this paper.
Author information
Author notes
These authors contributed equally: Allahkaram Shafiei, Hozefa Jesawada.
Authors and Affiliations
Czech Technical University, Prague, Czechia
Allahkaram Shafiei 1.
New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
Hozefa Jesawada 1.
Wellcome Centre for Human Neuroimaging, Institute of Neurology, University College London, London, UK
Karl Friston 1.
Department of Information and Electrical Engineering and Applied Mathematics, University of Salerno, Fisciano, Italy
Giovanni Russo
Authors
- Allahkaram Shafiei
- Hozefa Jesawada
- Karl Friston
- Giovanni Russo
Contributions
K.F. and G.R. conceptualized, designed and formulated the research. A.S. and G.R. conceptualized, designed and formulated resolution engine concepts. A.S. developed all the proofs with inputs from G.R. A.S., G.R., and H.J. revised the proofs. G.R. and H.J. designed the experiments. H.J., with inputs from G.R. and A.S., implemented DR-FREE, performed the experiments, and obtained the corresponding figures and data. DR-FREE code was revised by AS. All authors contributed to the interpretation of the results. G.R. wrote the manuscript with inputs from all the authors. All authors contributed to and edited the manuscript.
Corresponding author
Correspondence to Giovanni Russo.
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Alexey Skrynnik, Jun Tani, Jun Yang and the other anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Shafiei, A., Jesawada, H., Friston, K. et al. Distributionally robust free energy principle for decision-making. Nat Commun (2025). https://doi.org/10.1038/s41467-025-67348-6
Received: 26 March 2025
Accepted: 27 November 2025
Published: 17 December 2025
DOI: https://doi.org/10.1038/s41467-025-67348-6