Abstract
Parsing with Derivatives (PwD) is an elegant approach to parsing context-free grammars (CFGs). It takes the equational theory behind Brzozowski’s derivative for regular expressions and augments that theory with laziness, memoization, and fixed points. The result is a simple parser for arbitrary CFGs. Although recent work improved the performance of PwD, it remains inefficient due to the algorithm repeatedly traversing some parts of the grammar.
In this functional pearl, we show how to avoid this inefficiency by suspending the state of the traversal in a zipper. When subsequent derivatives are taken, we can resume the traversal from where we left off without retraversing already traversed parts of the grammar.
However, the original zipper is designed for use with tr…
Abstract
Parsing with Derivatives (PwD) is an elegant approach to parsing context-free grammars (CFGs). It takes the equational theory behind Brzozowski’s derivative for regular expressions and augments that theory with laziness, memoization, and fixed points. The result is a simple parser for arbitrary CFGs. Although recent work improved the performance of PwD, it remains inefficient due to the algorithm repeatedly traversing some parts of the grammar.
In this functional pearl, we show how to avoid this inefficiency by suspending the state of the traversal in a zipper. When subsequent derivatives are taken, we can resume the traversal from where we left off without retraversing already traversed parts of the grammar.
However, the original zipper is designed for use with trees, and we want to parse CFGs. CFGs can include shared regions, cycles, and choices between alternates, which makes them incompatible with the traditional tree model for zippers. This paper develops a generalization of zippers to properly handle these additional features. Just as PwD generalized Brzozowski’s derivatives from regular expressions to CFGs, we generalize Huet’s zippers from trees to CFGs.
Abstract The resulting parsing algorithm is concise and efficient: it takes only 31 lines of OCaml code to implement the derivative function but performs 6,500 times faster than the original PwD and 3.24 times faster than the optimized implementation of PwD.
Formats available
You can view the full content in the following formats:
Supplementary Material
Presentation at ICFP ’20 (a108-darragh-presentation.mp4)
- Download
- 30.80 MB
References
[1]
Michael D. Adams, Celeste Hollenbeck, and Mathew Might. 2016. On the complexity and performance of parsing with derivatives. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (Santa Barbara, CA, USA) ( PLDI ’16). ACM, New York, NY, USA, 224-236. https://doi.org/10.1145/2908080.2908128
[2]
Janusz A. Brzozowski. 1964. Derivatives of Regular Expressions. Journal of the ACM (JACM) 11, 4 (Oct. 1964 ), 481-494. https://doi.org/10.1145/321239.321249
[3]
Nils Anders Danielsson. 2010. Total parser combinators. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming (Baltimore, Maryland, USA) ( ICFP ’10). ACM, New York, NY, USA, 285-296. https://doi.org/10. 1145/1863543.1863585
[4]
Jay Earley. 1970. An eficient context-free parsing algorithm. Communications of the ACM (CACM) 13, 2 (Feb. 1970 ), 94-102. https://doi.org/10.1145/362007.362035
[5]
Romain Edelmann, Jad Hamza, and Viktor Kunčak. 2020. Zippy LL(1) Parsing with Derivatives. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (London, UK) ( PLDI ’20). ACM, New York, NY, USA, 1036-1051. https://doi.org/10.1145/3385412.3385992
[6]
Gérard Huet. 1997. The Zipper. Journal of Functional Programming 7, 05 (Sept. 1997 ), 549-554. https://doi.org/10.1017/ S0956796897002864
[7]
Jane Street. 2014. core_bench. https://github.com/janestreet/core_bench version 109.58.01.
[8]
Mark Johnson. 1995. Memoization in top-down parsing. Computational Linguistics 21, 3 (Sept. 1995 ), 405-417. http://dl.acm.org/citation.cfm?id= 216261. 216269
[9]
Xavier Leroy, Damien Doligez, Alain Frisch, Jacques Garrigue, Didier Rémy, and Jérôme Vouillon. 2020. The OCaml system: release 4.10. https://ocaml.org/releases/4.10/htmlman/
[10]
Conor McBride. 2001. The Derivative of a Regular Type is its Type of One-Hole Contexts. strictlypositive.org/diff.pdf
[11]
Conor McBride. 2008. Clowns to the left of me, jokers to the right (pearl): dissecting data structures. In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (San Francisco, California, USA) ( POPL ’08). ACM, New York, NY, USA, 287-295. https://doi.org/10.1145/1328438.1328474
[12]
Mathew Might, David Darais, and Daniel Spiewak. 2011. Parsing with derivatives: a functional pearl. In Proceedings of the 16th ACM SIGPLAN International Conference on Functional Programming (Tokyo, Japan) ( ICFP ’11). ACM, New York, NY, USA, 189-195. https://doi.org/10.1145/2034773.2034801
[13]
Emmanuel Onzon. 2012. dypgen: Self-extensible parsers and lexers for OCaml. http://dypgen.free.fr/ version 20120619.
[14]
Scot Owens, John Reppy, and Aaron Turon. 2009. Regular-expression derivatives re-examined. Journal of Functional Programming 19, 02 (March 2009 ), 173-190. https://doi.org/10.1017/S0956796808007090
[15]
François Potier and Yann Régis-Gianas. 2019. Menhir. http://gallium.inria.fr/~fpottier/menhir/ version 20190626.
[16]
Python Software Foundation. 2015a. Python 3.4.3. https://www.python.org/downloads/release/python-343/
[17]
Python Software Foundation. 2015b. The Python Language Reference: Full Grammar specification. https://docs.python.org/ 3/reference/grammar.html
[18]
Elizabeth Scot and Adrian Johnstone. 2010. GLL Parsing. Electronic Notes in Theoretical Computer Science 253, 7 (Sept. 2010 ), 177-189. https://doi.org/10.1016/j.entcs. 2010. 08.041
[19]
Elizabeth Scot and Adrian Johnstone. 2013. GLL parse-tree generation. Science of Computer Programming 78, 10 (Oct. 2013 ), 1828-1844. https://doi.org/10.1016/j.scico. 2012. 03.005