What building an LLM inference engine from scratch taught me about compiler design (opens in new tab)
the insight that started this project hit me while i was finishing a bytecode-compiled language i'd written in C i'd spent months building a hand-written lexer, a single-pass Pratt compiler, a stack VM with 35 opcodes, and a mark-and-sweep garbage collector. and right near the end i had this realization: an LLM inference engine is the same problem. it's a graph-compile plus memory-plan plus kernel-schedule problem. i'd just built one so i decided to find out if that was actually true the proj...
Read the original article