- 08 Nov, 2025 *
inspiration
I think I was playing a videogame and stealing some gormless corporate wage slave’s earthly possessions when it came to me: why fuck around with prefixed files and weird self-written merge algorithms when I can lean on some more prior art?
what?
In my last exploration of how far I can push a basic tree-sitter grammar and mergiraf for stellaris mods, I built a basic script that does what is essentially version control.
Version control, sort of
It makes sense: merging a big stack of mods onto the vanilla files is essentially taking a bunch of existing definitions, recognising that they’re all built on top of and exist next to the same base set and trying to squash them on top of each other based on your load order.
In the actual game there’s qui…
- 08 Nov, 2025 *
inspiration
I think I was playing a videogame and stealing some gormless corporate wage slave’s earthly possessions when it came to me: why fuck around with prefixed files and weird self-written merge algorithms when I can lean on some more prior art?
what?
In my last exploration of how far I can push a basic tree-sitter grammar and mergiraf for stellaris mods, I built a basic script that does what is essentially version control.
Version control, sort of
It makes sense: merging a big stack of mods onto the vanilla files is essentially taking a bunch of existing definitions, recognising that they’re all built on top of and exist next to the same base set and trying to squash them on top of each other based on your load order.
In the actual game there’s quite a lot of nuance involved in how it tends to ignore load orders (from stellaris’ own mod manager) and instead sorts things based on alphabetical order on a file level. The load order only comes in if files have an identical name, which can happen if you’re overwriting vanilla files.
why?
By taking a stack of definitions in the load order that they exist in in the launcher (or however you like to manage your mods), you eliminate a lot of the violation of expectation of an end-user. I think I have a reasonable handle on how the mod handling works, but I’d much rather be sure. Managing things this way makes me sure.
now the new stuff
expanded grammar
I’ve expanded the grammar so it handles more keywords and definition types without dying. Because there’s no official language specification, I’m just expanding this whenever I want to do something and it breaks.
I don’t want to call it elegant, but it is more robust now. That’s convenient for my patched mergiraf and difftastic, which now work better to resolve about 50% of what I’ve encountered so far. That’s out of a few thousand object definitions.
Of the remainder, it looks like there is a long tail (a solid 40 of the remaining 50%) of parse error related object files. It’s splitting out if statements, which I’m pretty sure come from event files. They have some syntax it’s currently not handling very well. A project for next time.
the big cheese
I spent a couple of hours building out the pdxscript parser that I built to test the functionality of the tree-sitter grammar on real input.
Everything below is done in a local jujutsu (a version control tool built on top of git) repository.
splitting
It currently splits all definitions into object files. The first commit is the base game files (taken locally from my game directory).
After that it is fed a modlist with normalised mod names (all lowercase and dashes, nothing outside of ascii alphabet characters) and full paths to the mods.
branching
For each of the mods it creates a bookmark and accompanying commit on top of the base layer, with all the split object files in it.
squashing
Because I knew beforehand it would break, I did the first set of squashes by hand. A new commit is made on top of base, and every mod is squashed into it. A patched (to use my parser and understand basic transposition and uniqueness requirements) mergiraf is then used to resolve conflicts.
was it worth it?
I think so. This way of processing mod lists is surprisingly convenient for testing the grammar. I lost a lot of time preparing real-world files and remembering all the paths when I worked on it before. Now that I have the scripting in place to do this automatically, testing is way easier.
You can see this in the tree-sitter grammar, which is a lot more robust than it was before.
You’ll have to forgive me the strange timestamps. I realised a little late I fell asleep after cleaning up an old commit so all my changes were added to one from a few days ago. When I split the changes out into more palatable change sets it retained the old timestamps. It’s definitely fixable, but it’s not so much of an issue.
processing real input
When feed the pdxparser script with one of the larger modlists I run for solo play it works, but for it to work well it currently definitely needs better conflict resolution.
transposition
When doing it by hand there are a lot of transposition conflicts (where lines are identical but in a different order). These simply shouldn’t exist, and my tools already resolve them (at least when they don’t fail to parse the input).
variable overwrites
The second large class of conflicts is variable overwrites (I’m going to classify these as balance). They’re a bit more challenging to merge in a semantically appropriate manner. In practice I’ll probably end up writing a heuristic for it (prefer this set of mods, add-only is fine, changes within this percentage range are okay, warn on everything else). That won’t be perfect, but it will be passable and I think the vast majority of people will prefer it this way.
In time making better tools to access where they are and display the ones that conceptually belong together is where this should go. It’s eminently doable, but it’s not super interesting to me with the issues that are still open.
triggers, ai config, etc
These are a category of additions that the parser currently names keywords. The vast majority of them accept existing multiple times (and is meant to exist this way). I’m pretty sure most of these are already automatically handled well. The ones that are not should be fixable with better parsing and handling.
esoteric definitions
Things like tags and tag-like definitions are currently just not handled well. I’ve put in the requisite parser logic but without writing more tree-sitter tests (and writing tests for the mergiraf fork to check proper merging) I do not have a lot of faith it works with no further attention. This means it will fail on 30-60% of inputs. Very irritating, but easy to solve.
spicy linguistics
If-statements on the root level are not split properly. They just need logic (along with root level variables) and then they will be fine.
unknown unknowns
When I split everything there were about 25000 objects. I will definitely encounter stuff I don’t yet know about.
what’s next
Putting a proper fork of difftastic and mergiraf on my sourcehut, versioning them correctly and working on more comprehensively teaching them the correct concepts.
Having either of those tools fail while working is currently where my effort stopped.
These are good problems to have, because my friction before was in not being able to find enough test input :) That is now solved.
I should probably also clean up the nushell scripts I’m using to feed the pdxscript-parse script and port it to python so it fits in the same repository. That way that part of the tooling doesn’t only live in my shell history like half of my wrapper scripts do.
Thinking about it, putting those in source control and porting them to a language more people can read (and that is more stable) might be a nice subject for a future blog post. I have a lot of audio and video processing commands that are 5-8 wide-screen console lines long (yes, they’re all oneliners).