Title:Rethinking Causal Discovery Through the Lens of Exchangeability
Abstract:Causal discovery methods have traditionally been developed under two distinct regimes: independent and identically distributed (i.i.d.) and timeseries data, each governed by separate modelling assumptions. In this paper, we argue that the i.i.d. setting can and should be reframed in terms of exchangeability, a strictly more general symmetry principle. We present the implications of this reframing, alongside two core arguments: (1) a conceptual argument, based on extending the dependency of experimental causal inference on exchangeability to causal discovery; and (2) an empirical argument…
Title:Rethinking Causal Discovery Through the Lens of Exchangeability
Abstract:Causal discovery methods have traditionally been developed under two distinct regimes: independent and identically distributed (i.i.d.) and timeseries data, each governed by separate modelling assumptions. In this paper, we argue that the i.i.d. setting can and should be reframed in terms of exchangeability, a strictly more general symmetry principle. We present the implications of this reframing, alongside two core arguments: (1) a conceptual argument, based on extending the dependency of experimental causal inference on exchangeability to causal discovery; and (2) an empirical argument, showing that many existing i.i.d. causal-discovery methods are predicated on exchangeability assumptions, and that the sole extensive widely-used real-world "i.i.d." benchmark (the Tübingen dataset) consists mainly of exchangeable (and not i.i.d.) examples. Building on this insight, we introduce a novel synthetic dataset that enforces only the exchangeability assumption, without imposing the stronger i.i.d. assumption. We show that our exchangeable synthetic dataset mirrors the statistical structure of the real-world "i.i.d." dataset more closely than all other i.i.d. synthetic datasets. Furthermore, we demonstrate the predictive capability of this dataset by proposing a neural-network-based causal-discovery algorithm trained exclusively on our synthetic dataset, and which performs similarly to other state-of-the-art i.i.d. methods on the real-world benchmark.
| Comments: | 37 pages, 4 figures |
| Subjects: | Machine Learning (cs.LG) |
| MSC classes: | 62D20 |
| Cite as: | arXiv:2512.10152 [cs.LG] |
| (or arXiv:2512.10152v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2512.10152 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Tiago Brogueira Mr [view email] [v1] Wed, 10 Dec 2025 23:19:39 UTC (1,923 KB)