Title:Translating the Rashomon Effect to Sequential Decision-Making Tasks
Abstract:The Rashomon effect describes the phenomenon where multiple models trained on the same data produce identical predictions while differing in which features they rely on internally. This effect has been studied extensively in classification tasks, but not in sequential decision-making, where an agent learns a policy to achieve an objective by taking actions in an environment. In this paper, we translate the Rashomon effect to sequential decision-making. We define it as multiple policies that exhibit identical behavior, visiting the same states and selecting the same actions, while differing in t…
Title:Translating the Rashomon Effect to Sequential Decision-Making Tasks
Abstract:The Rashomon effect describes the phenomenon where multiple models trained on the same data produce identical predictions while differing in which features they rely on internally. This effect has been studied extensively in classification tasks, but not in sequential decision-making, where an agent learns a policy to achieve an objective by taking actions in an environment. In this paper, we translate the Rashomon effect to sequential decision-making. We define it as multiple policies that exhibit identical behavior, visiting the same states and selecting the same actions, while differing in their internal structure, such as feature attributions. Verifying identical behavior in sequential decision-making differs from classification. In classification, predictions can be directly compared to ground-truth labels. In sequential decision-making with stochastic transitions, the same policy may succeed or fail on any single trajectory due to randomness. We address this using formal verification methods that construct and compare the complete probabilistic behavior of each policy in the environment. Our experiments demonstrate that the Rashomon effect exists in sequential decision-making. We further show that ensembles constructed from the Rashomon set exhibit greater robustness to distribution shifts than individual policies. Additionally, permissive policies derived from the Rashomon set reduce computational requirements for verification while maintaining optimal performance.
| Subjects: | Artificial Intelligence (cs.AI); Machine Learning (cs.LG) |
| Cite as: | arXiv:2512.17470 [cs.AI] |
| (or arXiv:2512.17470v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2512.17470 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Dennis Gross [view email] [v1] Fri, 19 Dec 2025 11:33:30 UTC (22 KB)