My personal path, a hobbyist, was focused at first on interpreters for Brainfuck, Scheme, lower-case forth, and lower-case lisp. I had a bit of "formal" undergraduate training in one PL course and one compilers course I took before I dropped out, but for the most part I hacked on stuff since then for fun and education.
After I was confident implementing some of those minimal languages, I moved on to minimal versions of Lua, JavaScript/TypeScript, Python, SQL and Go; varying between implementing AST interpreters, bytecode VMs and native-code compilers (via C, LLVM, and x86). Either using the language’s first-party parser or implementing my own handwritten parser.
Some blind spots
I have never implemented garbage collection myself (at best I’ve hooked into the host language’…
My personal path, a hobbyist, was focused at first on interpreters for Brainfuck, Scheme, lower-case forth, and lower-case lisp. I had a bit of "formal" undergraduate training in one PL course and one compilers course I took before I dropped out, but for the most part I hacked on stuff since then for fun and education.
After I was confident implementing some of those minimal languages, I moved on to minimal versions of Lua, JavaScript/TypeScript, Python, SQL and Go; varying between implementing AST interpreters, bytecode VMs and native-code compilers (via C, LLVM, and x86). Either using the language’s first-party parser or implementing my own handwritten parser.
Some blind spots
I have never implemented garbage collection myself (at best I’ve hooked into the host language’s GC, similar to using libgc). I haven’t implemented register allocation since the college compilers course. I haven’t implemented JIT compilation. I haven’t spent very much time targeting Windows or macOS or anything other than Linux/x86_64.
The popularity of lisps and forths
You might wonder why so many PL/compiler resources focus on languages like lower-case lisp. Basically, parsing more common languages is more tedious because they have more syntax. So it takes you longer to get a working language implementation compared to the sparse syntax of a lisp (or a forth).
Once you get the hang of it with a forth or lisp though it’s easy to bring that skill to "normal" languages. Although you will then need to pick up a technique for handling operator precedence (discussed below).
Introductory
These helped me out and I think are reasonable to recommend to others. The list is not long because I have not explored that many broad introductory texts. And many that I did (listed further below) I really didn’t like.
Parsing (operator precedence)
First off, real world languages generally don’t use parser generators. Parser generators are also harder to learn, and are another dependency and build step. So you can happily skip.
If you still want to learn how to use a parser generator, look at books that are otherwise not ones I recommend like Modern Compiler Implementation in Java/C/ML or at The Dragon Book.
You can pick up the basics of handwritten parsers from the items in the Introductory section above. The major complex part remaining is operator precedence. Even though I’ve implemented it a few times, I need to go and look up an explanation again every time.
Basically, look up Shunting Yard, Pratt Parsing, or Precedence Climbing. There were one or two pages that helped me out in particular but I can’t find them at the moment.
Andy Chu of Oil Shell has a survey of various explanations that you may find useful.
Code generation
- Destination-Driven Code Generation
- One-pass Code Generation in V8
- A Performance Survey on Stack-based and Register-based Virtual Machines
- Virtual Machine Showdown: Stack Versus Registers
- How to JIT - an introduction
- Adventures in JIT compilation: Part 1 - an interpreter
(Non-introductory) books
- LISP in Small Pieces by Christian Queinnec
Notes: The final chapter ends (oddly enough) with building a little forth implementation. This book is one of my favorite technical books.
- Lisp System Implementation by Nils M Holm
Notes: A literate walkthrough of a lisp implementation in C.
- Compiler Construction by Niklaus Wirth
Hacking on existing languages
- Taq Karim’s series on hacking CPython
- Avinash Sajjanshetty’s Hacking Go compiler to add a new keyword
Blogs
Various blogs and pages I’ve enjoyed reading and/or found helpful.
- The V8 Blog
- Older posts on the WebKit blog such as when they implemented their first bytecode VM
- Max Bernstein’s Programming languages resources page
- Andy Wingo’s blog
- Laurence Tratt’s blog
- Peter Bex’s posts on Chicken Scheme
- Stefan Marr’s blog
Pedagogical projects
Stuff I’ve written
I can’t evaluate my own stuff objectively but this is my list so I’m going to share with you the various resources I’ve written on the broad subject of compilers and interpreters.
More focused on parsing
- Writing a simple JSON parser in PL/pgSQL
- Writing a simple JSON parser in C++
- Writing a Jinja-inspired template library in Python
- Writing a simple JSON path parser in JavaScript
- Writing a simple JSON parser in Python
Language implementation more generally
- Writing a minimal Lua implementation with a virtual machine from scratch in Rust
- Writing a SQL database, take two: Zig and RocksDB
- Writing a document database from scratch in Go: Lucene-like filters and indexes
- Implementing a forth-like interpreter in PL/pgSQL
- Writing a SQL database from scratch in Go
- Compiling a lisp to LLVM and x86 assembly in JavaScript
Communities
- /r/Compilers
- /r/ProgrammingLanguages
- #pl channel on the hacker Discord I host
Stuff I’d like to find or see written about
If you know of anything here or end up writing about one of these, let me know!
- Survey of bytecode instructions across various VMs, and the implications (sort of like RISC vs CISC but for language VMs)
- Survey of object representations in dynamic languages, and the implications
- Survey of calling conventions across computer architectures and bytecode VMs, and the implications
Heard good things
I haven’t tried these out, but I commonly see them recommended.
- Bob Nystrom’s Crafting Interpreters
- Dave Beazley’s Write a Compiler course
- Thorsten Ball’s Writing an Interpreter in Go and Writing a Compiler in Go
- Tim Morgan’s Hacking Natalie (a Ruby implementation) Youtube series
- Andreas Kling’s Youtube series on Language hacking: Jakt and Let’s build a JavaScript Bytecode VM
Conspicuous books not on the list
As you dig further into compilers/interpreters maybe you want to check these out. I own them and try to browse them occasionally but overall I’m not a fan. If you like them, great!
- Structure and Interpretation of Computer Programs (SICP)
- Compilers: Principles, Techniques, and Tools (aka "The Dragon Book")
- The Little Typer
Notes: I know some people like it but boy do I abhor both the style and content of this book. It is extremely complicated and the whimsical style just makes me angry. Maybe The Little Schemer is better since the topic is less complex. I’m scared to try it.
- Modern Compiler Implementation in Java/ML/C
Notes: This one was the text for the compilers course I took in college and I loved that course. But ultimately I’m not sure this book is as good as some other resources listed above (in aggregate).
Share
I threw together a page with a few of my favorite resources for learning and hacking on compilers/interpretershttps://t.co/741TDxDLEO pic.twitter.com/tErFu9sjdy
— Phil Eaton (@phil_eaton) January 4, 2023
Feedback
As always, please email or tweet me with questions, corrections, or ideas!