u/small_kimono mentioned wanting to help out with testing, and this is an area where there’s still more work to be done and other people have either expressed interest or are already jumping in and helping out (a Lustre guy at Amazon has been sending me ktest code, and we’ve been sketching out ideas together) - so, going to document where things are at.
We do have a lot of automated testing already; right now it’s distributed across half a dozen 80 core arm machines with a 256 GB of ram each, with subtest level sharding and an actual dashboard that gets results back reasonably quickly with a git log view (why does no one else have this? this was the first thing I knew I wanted 15 years ago, heh).
The test suit encompasses xfstests, a…
u/small_kimono mentioned wanting to help out with testing, and this is an area where there’s still more work to be done and other people have either expressed interest or are already jumping in and helping out (a Lustre guy at Amazon has been sending me ktest code, and we’ve been sketching out ideas together) - so, going to document where things are at.
We do have a lot of automated testing already; right now it’s distributed across half a dozen 80 core arm machines with a 256 GB of ram each, with subtest level sharding and an actual dashboard that gets results back reasonably quickly with a git log view (why does no one else have this? this was the first thing I knew I wanted 15 years ago, heh).
The test suit encompasses xfstests, a ton of additional tests I’ve written for all the multi device stuff and things specific to bcachefs, and the full test runs run a bunch of additional variants (builds with kasan, lockdep, preempt, nodebug etc.).
So, as far as I know bcachefs testing is actually ahead of all the other local filesystems, except for maybe ZFS - I’ve never talked to the ZFS folks about testing. But there’s still a lot of improvements we need (and hopefully not just for bcachefs, the kernel is really lacking in automated testing).
I would really like to hear from other people with deep experience in the testing/distributed jobrunning area, there really should be better tools for this stuff but if there are I haven’t found them. My dream would be to find some nice Rust libraries that handle the core parts, but I’m not sure that exists yet - in the testing world everyone seems to still just be building giant monoliths.
So, overview of where we’re at and what we need:
https://evilpiepirate.org/git/ktest.git/
ktest: big pile of bash, plus some newer Rust that is still lacking in error handling and needs cleanup (I’m not as experienced with Rust as C, and I was in a hurry). On the plus side, it actually works, it’s not janky when you get it going (everything is properly watchdogged/cleans up after itself, the whole distributed system requires zero maintenance) - and much of the architecture is a lot cleaner than what I typically see in this area.
Right now, scheduling jobs is primitive, it needs to be push instead of pull, the head node explicitly deciding what needs to run where and collecting output as things run; this will give us better debugability and visibility, and fix some scalability issues
It only knows how to test commits in one repository (the kernel); it needs to understand multiple repos and multiple things to watch and test together, given that we’re DKMS now. This is also the big thing Lustre needs (and we need to be joining forces on testing, in the filesystem world we’ve always rolled our own and that sucks).
It needs to understand that "job to schedule != test"; i.e. to run a test there really need to be multiple jobs that depend on each other (like a build system). Right now, for subtest level sharding each worker is building the kernel every time it run some tests, meaning that they’re duplicating a ton of builds. And DKMS doesn’t let us get rid of this, we need to be doing different kernel builds for lockdep/kasan/etc.
ktest right now assumes that it’s going to build the kernel from scratch, we need to teach it how to test the DKMS version with all the different distro kernels