As an amateur compiler developer, one of the decisions I struggle with is choosing the right compiler backend. Unlike the 80’s when people had to target various machine architectures directly, now there are many mature options available. This is a short and very incomplete survey of some of the popular and interesting options.
Contents
As an amateur compiler developer, one of the decisions I struggle with is choosing the right compiler backend. Unlike the 80’s when people had to target various machine architectures directly, now there are many mature options available. This is a short and very incomplete survey of some of the popular and interesting options.
Contents
- Machine Code / Assembly
- Intermediate Representations
- Other High-level Languages
- Virtual Machines / Bytecode
- WebAssembly
- Meta-tracing Frameworks
- Unconventional Backends
- Conclusion
Machine Code / Assembly
A compiler can always directly output machine code or assembly targeted for one or more architectures. A well-known example is the Tiny C Compiler. It’s known for its speed and small size, and it can compile and run C code on the fly. Another such example is Turbo Pascal. You could do this with your compiler too, but you’ll have to figure out the intricacies of the Instruction set of each architecture (ISA) you want to target, as well as, concepts like register allocation.
Most modern compilers actually don’t emit machine code or assembly directly. They lower the source code down to a language-agnostic Intermediate representation (IR) first, and then generate machine code for major architectures (x86-64, ARM64, etc.) from it.
The most prominent tool in this space is LLVM. It’s a large, open-source compiler-as-a-library. Compilers for many languages such as Rust, Swift, C/C++ (via Clang), and Julia use LLVM as an IR to emit machine code.
An alternative is the GNU C compiler (GCC), via its GIMPLE IR, though no compilers seem to use it directly. GCC can be used as a library to compile code, much like LLVM, via libgccjit. It is used in Emacs to Just-in-time (JIT) compile Elisp. Cranelift is another new option in this space, though it supports only few ISAs.
For those who find LLVM or GCC too large or slow to compile, minimalist alternatives exist. QBE is a small backend focused on simplicity, targeting “70% of the performance in 10% of the code”. It’s used by the language Hare that prioritizes fast compile times. Another option is libFIRM, which uses a graph-based SSA representation instead of a linear IR.
Other High-level Languages
Sometimes you are okay with letting other compilers/runtimes take care of the heavy lifting. You can transpile your code to a another established high-level language and leverage that language’s existing compiler/runtime and toolchain.
A common target in such cases is C. Since C compilers exist for nearly all platforms, generating C code makes your language highly portable. This is the strategy used by Chicken Scheme and Vala. Or you could compile to C++ instead, like Jank, if that’s your thing.
Another ubiquitous target is JavaScript (JS), which is one of the two options (other being WebAssembly) for running code natively in a web browser or one of the JS runtimes (Node, Deno, Bun). Multiple languages such as TypeScript, PureScript, Reason, ClojureScript, Dart and Elm transpile to JS. Nim interestingly, can transpile to C, C++ or JS.
A more niche approach is to target a Lisp dialect. Compiling to Chez Scheme, for example, allows you to leverage its macro system, runtime, and compiler. The Idris 2 and Racket use Chez Scheme as their primary backends.
Virtual Machines / Bytecode
This is a common choice for application languages. You compile to a portable bytecode for a Virtual machine (VM). VMs generally come with features like Garbage collection, JIT compilation, and security sandboxing.
The Java Virtual Machine (JVM) is probably the most popular one. It’s the target for many languages including Java, Kotlin, Scala, Groovy, and Clojure. Its main competitor is the Common Language Runtime, originally developed by Microsoft, which is targeted by languages such as C#, F#, and Visual Basic.NET.
Another notable VM is the BEAM, originally built for Erlang. The BEAM VM isn’t built for raw computation speed but for high concurrency, fault tolerance, and reliability. Recently, new languages such as Elixir and Gleam have been created to target it.
Finally, this category also includes MoarVM—the spiritual successor to the Parrot VM—built for the Raku (formerly Perl 6) language, and the LuaJIT VM for Lua, and other languages that transpile to Lua, such as MoonScript and Fennel.
WebAssembly
WebAssembly (Wasm) is a relatively new target. It’s a portable binary instruction format focused on security and efficiency. Wasm is supported by all major browsers, but not limited to them. The WebAssembly System Interface (WASI) standard provides APIs for running Wasm in non-browser and non-JS environments. Wasm is now targeted by many languages such as Rust, C/C++, Go, Kotlin, Scala, Zig, and Haskell.
Meta-tracing frameworks are a more complex category. These are not the backend you target in your compiler, instead, you use them to build a custom JIT compiler for your language by specifying an interpreter for it.
The most well-known example is PyPy, an implementation of Python, created using the RPython framework. Another such framework is GraalVM/Truffle, a polyglot VM and meta-tracing framework from Oracle. Its main feature is zero-cost interoperability: code from GraalJS, TruffleRuby, and GraalPy can all run on the same VM, and can call each other directly.
Unconventional Backends
Move past the mainstream, and you’ll discover a world of unconventional and esoteric compiler backends. Developers pick them for academic curiosity, artistic expression, or to test the boundaries of viable compilation targets.
Brainfuck: An esoteric language with only eight commands, Brainfuck is Turing-complete and has been a target for compilers as a challenge. People have written compilers for C, Haskell and Lambda calculus.
Lambda calculus: Lambda calculus is a minimal programming languages that expresses computation solely as functions and their applications. It is often used as the target of educational compilers because of its simplicity, and its link to the fundamental nature of computation. Hell, a subset of Haskell, compiles to Simply typed lambda calculus.
SKI combinators: The SKI combinator calculus is even more minimal than lambda calculus. All programs in SKI calculus can be composed of only three combinators: S, K and I. MicroHs compiles a subset of Haskell to SKI calculus.
JSFuck: Did you know that you can write all possible JavaScript programs using only six characters []()!+? Well, now you know.
Postscript: Postscript is also a Turing-complete programming language. Your next compiler could target it!
Regular Expressions? Lego? Cellular automata?
Conclusion
I’m going to write a compiler from C++ to JSFuck.
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
Like, repost, or comment
Send a Webmention for this post
Posted by Abhinav Sarkar at https://abhinavsarkar.net/notes/2025-compiler-backend-survey/