Abstract
Differential privacy (DP) has become the gold standard for privacy-preserving data analysis, but implementing it correctly has proven challenging. Prior work has focused on verifying DP at a high level, assuming either that the foundations are correct or that a perfect source of random noise is available. However, the underlying theory of differential privacy can be very complex and subtle. Flaws in basic mechanisms and random number generation have been a critical source of vulnerabilities in real-world DP systems.
In this paper, we present SampCert, the first comprehensive, mechanized foundation for executable implementations of differential privacy. SampCert is written in Lean with over 12,000 lines of proof. It offers a generic and extensible notion of DP, a frame…
Abstract
Differential privacy (DP) has become the gold standard for privacy-preserving data analysis, but implementing it correctly has proven challenging. Prior work has focused on verifying DP at a high level, assuming either that the foundations are correct or that a perfect source of random noise is available. However, the underlying theory of differential privacy can be very complex and subtle. Flaws in basic mechanisms and random number generation have been a critical source of vulnerabilities in real-world DP systems.
In this paper, we present SampCert, the first comprehensive, mechanized foundation for executable implementations of differential privacy. SampCert is written in Lean with over 12,000 lines of proof. It offers a generic and extensible notion of DP, a framework for constructing and composing DP mechanisms, and formally verified implementations of Laplace and Gaussian sampling algorithms. SampCert provides (1) a mechanized foundation for developing the next generation of differentially private algorithms, and (2) mechanically verified primitives that can be deployed in production systems. Indeed, SampCert’s verified algorithms power the DP offerings of Amazon Web Services, demonstrating its real-world impact.
SampCert’s key innovations include: (1) A generic DP foundation that can be instantiated for various DP definitions (e.g., pure, concentrated, Rényi DP); (2) formally verified discrete Laplace and Gaussian sampling algorithms that avoid the pitfalls of floating-point implementations; and (3) a simple probability monad and novel proof techniques that streamline the formalization.
To enable proving complex correctness properties of DP and random number generation, SampCert makes heavy use of Lean’s extensive Mathlib library, leveraging theorems in Fourier analysis, measure and probability theory, number theory, and topology.
Formats available
You can view the full content in the following formats:
Supplemental Material
MP4 File - Verified Foundations for Differential Privacy (Video)
Video of conference presentation
- Download
- 472.48 MB
References
[1]
John M Abowd. 2018. The US Census Bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2867–2867.
[2]
Chike Abuah, Alex Silence, David Darais, and Joseph P Near. 2021. DDUO: General-purpose dynamic analysis for differential privacy. In 2021 IEEE 34th Computer Security Foundations Symposium (CSF). 1–15.
[3]
Aws Albarghouthi and Justin Hsu. 2018. Synthesizing coupling proofs of differential privacy. Proceedings of the ACM on Programming Languages, 2, POPL (2018), 1–30.
[4]
Apple Inc. 2017. Differential Privacy Overview. https://images.apple.com/privacy/docs/Differential_Privacy_Overview.pdf Accessed: [2024-10-31]
[5]
Philippe Audebaud and Christine Paulin-Mohring. 2006. Proofs of Randomized Algorithms in Coq. In Mathematics of Program Construction, 8th International Conference, MPC 2006, Kuressaare, Estonia, July 3-5, 2006, Proceedings, Tarmo Uustalu (Ed.) (Lecture Notes in Computer Science, Vol. 4014). Springer, 49–68. https://doi.org/10.1007/11783596_6
[6]
Victor Balcer and Salil Vadhan. 2017. Differential privacy on finite computers. arXiv preprint arXiv:1709.05396.
[7]
Gilles Barthe, Noémie Fong, Marco Gaboardi, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. 2016. Advanced Probabilistic Couplings for Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ’16). Association for Computing Machinery, New York, NY, USA. 55–67. isbn:9781450341394 https://doi.org/10.1145/2976749.2978391
[8]
Gilles Barthe, Marco Gaboardi, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. 2016. Proving differential privacy via probabilistic couplings. 1–10.
[9]
Gilles Barthe, Boris Köpf, Federico Olmedo, and Santiago Zanella Béguelin. 2012. Probabilistic relational reasoning for differential privacy. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’12). Association for Computing Machinery, New York, NY, USA. 97–110. isbn:9781450310833 https://doi.org/10.1145/2103656.2103670
[10]
Skye Berghel, Philip Bohannon, Damien Desfontaines, Charles Estes, Sam Haney, Luke Hartman, Michael Hay, Ashwin Machanavajjhala, Tom Magerlein, Gerome Miklau, Amritha Pai, William Sexton, and Ruchit Shrestha. 2022. Tumult Analytics: a robust, easy-to-use, scalable, and expressive framework for differential privacy. arXiv preprint arXiv:2212.04133, Dec.
[11]
Benjamin Bichsel, Timon Gehr, Dana Drachsler-Cohen, Petar Tsankov, and Martin Vechev. 2018. DP-finder: Finding differential privacy violations by sampling and optimization. 508–524.
[12]
Mark Bun and Thomas Steinke. 2016. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of cryptography conference. 635–658.
[13]
Clément L Canonne, Gautam Kamath, and Thomas Steinke. 2020. The discrete gaussian for differential privacy. Advances in Neural Information Processing Systems, 33 (2020), 15676–15688.
[14]
Arthur Azevedo de Amorim, Marco Gaboardi, and Vivien Rindisbacher. 2023. Verified Differential Privacy for Finite Computers. Workshop on Coq for Programming Languages.
[15]
Markus de Medeiros, Muhammad Naveed, Tancrède Lepoint, Temesghen Kahsai, Tristan Ravitch, Stefan Zetzsche, Anjali Joshi, Joseph Tassarotti, Aws Albarghouthi, and Jean-Baptiste Tristan. 2024. Verified Foundations for Differential Privacy. arxiv:2412.01671. arxiv:2412.01671
[16]
Markus de Medeiros, Muhammad Naveed, Tancrède Lepoint, Temesghen Kahsai, Tristan Ravitch, Stefan Zetzsche, Anjali Joshi, Joseph Tassarotti, Aws Albarghouthi, and Jean-Baptiste Tristan. 2025. Artifact for Verified Foundations for Differential Privacy. https://doi.org/10.5281/zenodo.15042645
[17]
Markus de Medeiros, Muhammad Naveed, Tancrède Lepoint, Temesghen Kahsai, Tristan Ravitch, Stefan Zetzsche, Anjali Joshi, Joseph Tassarotti, Aws Albarghouthi, and Jean-Baptiste Tristan. 2025. Artifact for Verified Foundations for Differential Privacy. https://doi.org/10.5281/zenodo.15042644
[18]
Damien Desfontaines. 2021. A list of real-world uses of differential privacy. https://desfontain.es/blog/real-world-differential-privacy.html Accessed: [2024-11-13]
[19]
Zeyu Ding, Yuxin Wang, Guanhong Wang, Danfeng Zhang, and Daniel Kifer. 2018. Detecting violations of differential privacy. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 475–489.
[20]
Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. 2006. Our Data, Ourselves: Privacy Via Distributed Noise Generation. In Advances in Cryptology - EUROCRYPT 2006, Serge Vaudenay (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg. 486–503. isbn:978-3-540-34547-3
[21]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. 2006. Calibrating Noise to Sensitivity in Private Data Analysis. 3876, 265–284. https://doi.org/10.1007/11681878_14
[22]
Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science, 9, 3–4 (2014), 211–407. https://doi.org/10.1561/0400000042
[23]
Cynthia Dwork and Guy N Rothblum. 2016. Concentrated differential privacy. arXiv preprint arXiv:1603.01887.
[24]
Manuel Eberl, Johannes Hölzl, and Tobias Nipkow. 2015. A Verified Compiler for Probability Density Functions. In Programming Languages and Systems - 24th European Symposium on Programming, ESOP 2015, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2015, London, UK, April 11-18, 2015. Proceedings, Jan Vitek (Ed.) (Lecture Notes in Computer Science, Vol. 9032). Springer, 80–104. https://doi.org/10.1007/978-3-662-46669-8_4
[25]
Gian Pietro Farina, Stephen Chong, and Marco Gaboardi. 2021. Coupled relational symbolic execution for differential privacy. In Programming Languages and Systems: 30th European Symposium on Programming, ESOP 2021, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021, Luxembourg City, Luxembourg, March 27–April 1, 2021, Proceedings 30. 207–233.
[26]
Roman Frič and Martin Papčo. 2010. A categorical approach to probability theory. Studia Logica, 94 (2010), 215–230.
[27]
Marco Gaboardi, Andreas Haeberlen, Justin Hsu, Arjun Narayan, and Benjamin C Pierce. 2013. Linear dependent types for differential privacy. In Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 357–370.
[28]
Timon Gehr, Sasa Misailovic, and Martin Vechev. 2016. PSI: Exact symbolic inference for probabilistic programs. In Computer Aided Verification: 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I 28. 62–83.
[29]
Naoise Holohan, Stefano Braghin, Pól Mac Aonghusa, and Killian Levacher. 2019. Diffprivlib: the IBM differential privacy library. ArXiv e-prints, 1907.02444 [cs.CR] (2019), July.
[30]
Joe Hurd. 2003. Formal verification of probabilistic algorithms. University of Cambridge, Computer Laboratory. https://doi.org/10.48456/tr-566
[31]
Jiankai Jin, Eleanor McMurtry, Benjamin IP Rubinstein, and Olga Ohrimenko. 2022. Are we there yet? timing and floating-point attacks on differential privacy systems. In 2022 IEEE Symposium on Security and Privacy (SP). 473–488.
[32]
Lean. 2024. Mathlib. https://leanprover-community.github.io/mathlib-overview.html Accessed: [2024-11-13]
[33]
K. Rustan M. Leino. 2010. Dafny: An Automatic Program Verifier for Functional Correctness. In Logic for Programming, Artificial Intelligence, and Reasoning, Edmund M. Clarke and Andrei Voronkov (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 348–370. isbn:978-3-642-17511-4
[34]
Min Lyu, Dong Su, and Ninghui Li. 2017. Understanding the Sparse Vector Technique for Differential Privacy. 10, 6 (2017), 637–648. https://doi.org/10.14778/3055330.3055331
[35]
Frank D McSherry. 2009. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. 19–30.
[36]
Ilya Mironov. 2012. On significance of the least significant bits for differential privacy. In Proceedings of the 2012 ACM conference on Computer and communications security. 650–661.
[37]
Ilya Mironov. 2017. Rényi differential privacy. In 2017 IEEE 30th computer security foundations symposium (CSF). 263–275.
[38]
Leonardo de Moura and Sebastian Ullrich. 2021. The Lean 4 theorem prover and programming language. In Automated Deduction–CADE 28: 28th International Conference on Automated Deduction, Virtual Event, July 12–15, 2021, Proceedings 28. 625–635.
[39]
Joseph P Near, David Darais, Chike Abuah, Tim Stevens, Pranav Gaddamadugu, Lun Wang, Neel Somani, Mu Zhang, Nikhil Sharma, and Alex Shan. 2019. Duet: an expressive higher-order language and linear type system for statically enforcing differential privacy. Proceedings of the ACM on Programming Languages, 3, OOPSLA (2019), 1–30.
[40]
Jason Reed and Benjamin C Pierce. 2010. Distance makes the types grow stronger: a calculus for differential privacy. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming. 157–168.
[41]
Subhajit Roy, Justin Hsu, and Aws Albarghouthi. 2021. Learning Differentially Private Mechanisms. In 42nd IEEE Symposium on Security and Privacy, SP 2021, San Francisco, CA, USA, 24-27 May 2021. IEEE, 852–865. https://doi.org/10.1109/SP40001.2021.00060
[42]
Tetsuya Sato and Yasuhiko Minamide. 2025. Formalization of Differential Privacy in Isabelle/HOL. In Proceedings of the 14th ACM SIGPLAN International Conference on Certified Programs and Proofs (CPP ’25). Association for Computing Machinery, New York, NY, USA. 67–82. isbn:9798400713477 https://doi.org/10.1145/3703595.3705875
[43]
Latanya Sweeney. 2000. Simple Demographics Often Identify People Uniquely.
[44]
Jun Tang, Aleksandra Korolova, Xiaolong Bai, Xueqiang Wang, and Xiaofeng Wang. 2017. Privacy loss in apple’s implementation of differential privacy on macos 10.12. arXiv preprint arXiv:1709.02753.
[45]
Matías Toro, David Darais, Chike Abuah, Joseph P. Near, Damián Árquez, Federico Olmedo, and Éric Tanter. 2023. Contextual Linear Types for Differential Privacy. ACM Trans. Program. Lang. Syst., 45, 2 (2023), Article 8, May, 69 pages. issn:0164-0925 https://doi.org/10.1145/3589207
[46]
Eelis van der Weegen and James McKinna. 2008. A Machine-Checked Proof of the Average-Case Complexity of Quicksort in Coq. In Types for Proofs and Programs, International Conference, TYPES 2008, Torino, Italy, March 26-29, 2008, Revised Selected Papers, Stefano Berardi, Ferruccio Damiani, and Ugo de’Liguoro (Eds.) (Lecture Notes in Computer Science, Vol. 5497). Springer, 256–271. https://doi.org/10.1007/978-3-642-02444-3_16
[47]
Yuxin Wang, Zeyu Ding, Daniel Kifer, and Danfeng Zhang. 2020. CheckDP: An Automated and Integrated Approach for Proving Differential Privacy or Finding Precise Counterexamples. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS ’20). Association for Computing Machinery, New York, NY, USA. 919–938. isbn:9781450370899 https://doi.org/10.1145/3372297.3417282
[48]
Yuxin Wang, Zeyu Ding, Guanhong Wang, Daniel Kifer, and Danfeng Zhang. 2019. Proving differential privacy with shadow execution. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 655–669. isbn:9781450367127 https://doi.org/10.1145/3314221.3314619
[49]
Royce J Wilson, Celia Yuxin Zhang, William Lam, Damien Desfontaines, Daniel Simmons-Marengo, and Bryant Gipson. 2019. Differentially private sql with bounded user contribution. arXiv preprint arXiv:1909.01917.
[50]
Danfeng Zhang and Daniel Kifer. 2017. LightDP: towards automating differential privacy proofs. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL ’17). Association for Computing Machinery, New York, NY, USA. 888–901. isbn:9781450346603 https://doi.org/10.1145/3009837.3009884
[51]
Hengchu Zhang, Edo Roth, Andreas Haeberlen, Benjamin C. Pierce, and Aaron Roth. 2020. Testing differential privacy with dual interpreters. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 165, Nov., 26 pages. https://doi.org/10.1145/3428233