Thanks, the talk is really interesting.
Thinking about this more, the representation in the blog post is also too simple and I suspect it will be super wasteful in practice if you apply it directly. It stores all types of nodes in the same vector, so the elements need to be as large as the largest AST node (or you box the nodes which I suspect will be a worse solution than the traditional pointer-based ASTs as you have an extra layer of indirection now when going from a parent to child node). They address this in the talk you linked where they keep some of the information on side tables/other vectors. In byte code you don’t have this issue if you treat the byte code is a binary blob with variable-size instructions. This works because typically you don’t need to go from e.g. an in…
Thanks, the talk is really interesting.
Thinking about this more, the representation in the blog post is also too simple and I suspect it will be super wasteful in practice if you apply it directly. It stores all types of nodes in the same vector, so the elements need to be as large as the largest AST node (or you box the nodes which I suspect will be a worse solution than the traditional pointer-based ASTs as you have an extra layer of indirection now when going from a parent to child node). They address this in the talk you linked where they keep some of the information on side tables/other vectors. In byte code you don’t have this issue if you treat the byte code is a binary blob with variable-size instructions. This works because typically you don’t need to go from e.g. an instruction like i32.add to its operands, you scan linearly and push/pop/execute/jump as you read the next opcode. (if you need to go from an instruction to its arguments that you can parse to a different representation that makes it easy, like a tree AST)
For others not wanting to watch the talk: they "serialize" (make linear) the AST nodes based on the post-order traversal of the AST nodes. So if you have a call node with function fun and args arg1, arg2, and arg3 (assume that these exprs are atomic), the array will be like: [arg1, arg2, arg3, fun].
I don’t know how they go from the "call" node here to args though when not traversing in post order. E.g. maybe you collect a bunch of nodes to revisit and potentially optimize in an analysis pass. Then you revisit them, and now you have to go from a parent node to a child node, which this representation doesn’t make easy.
(Note: I didn’t watch the whole thing, he may be answering the question, I don’t know)
Edit: just came to the point where he talks about this: they store extra metadata to go from parent to children. I don’t know how useful storing in post-order is once you have the metadata.