This week
Yirgacheffe
The short paper on the design and use of Yirgacheffe was submitted to PROPL was submitted on time, but not without a little stressing to the end, which is the downside of paper deadlines: something always turns up that makes them a rush, even if you felt you had things mostly in hand the week before.
Context: for those who havenāt seen it before, one of the main features of Yirgacheffe is that you can specify numerical operations directly on geospatial datasets, so you can add/multiply/filter these large rasters or polygons directly, and itāll do all the book keeping about aligning pixels, rasterizing polygons, etc., and at the end you either saveā¦
This week
Yirgacheffe
The short paper on the design and use of Yirgacheffe was submitted to PROPL was submitted on time, but not without a little stressing to the end, which is the downside of paper deadlines: something always turns up that makes them a rush, even if you felt you had things mostly in hand the week before.
Context: for those who havenāt seen it before, one of the main features of Yirgacheffe is that you can specify numerical operations directly on geospatial datasets, so you can add/multiply/filter these large rasters or polygons directly, and itāll do all the book keeping about aligning pixels, rasterizing polygons, etc., and at the end you either save the result to another raster layer, or you perform some aggregation like summing all the pixels or finding the min/max.
One of the less used features of Yirgacheffe, at least by me, is that when doing that save or aggregation, Yirgacheffe can attempt to do so in parallel using multiple CPU cores. Normally the pipelines I work on donāt use this feature as they tend towards data flows that work better if I run the same script many times in parallel, rather than one script that does everything within it. Partly this is down to Python being generally poor at parallelism, but mostly down to the data flows, e.g., processing thousands of area of habitat calculations at a time, itās jsut easier to run the AoH script once per species, and I can use an external tool like GNU Parallel or Littlejohn to orchestrate that.
But, there are times when you just one script to do some calculation on a big raster as fast as possible, and for that I added the option to use multiple cores for the calculations. Internally you can imagine Yirgacheffe breaks down each calculation into say rows of pixels and does them one at time to avoid having to load too much data into memory, so itās a small logical leap to say weāll do several of those rows at a time in parallel, as theyāre independent of each other. Yirgacheffe doesnāt try to do anything very clever here, but I found when I benchmarked the feature it performed much poorer than Iād expected, actually being several times slower than just using a single thread in some instances, one being over 6 times slower!
My test case was processing 277 different species AoHs. I did specifically go for a mix of ranges, but the data for species sizes does tend to skew small, so donāt process much data. Whilst I said above you could imagine Yirgacheffe processes a row of pixel data at a time, it actually does larger chunks than that: partly to get better disk behaviour and partly because polygon rasterization works very poorly at that scale, as it still has to process the entire polygon each time you want to rasterize a small chunk of it, and for species with ranges defined by detailed coastlines that can be a lot of data.
So I realised that for many small species it was doing a single chunk of data, and if I set the parallelize flag it was still trying to do that work on a worker thread, which in Python is quite expensive to set up. So I added some checks to see if you would actually need parallelism, and if the calculation was just one chunk of data, then itād revert to the single thread code path.
This still isnāt great, with still quite a few instances being slower than single threaded, but did bring the mean down taking less than a third of the original performance, with the min being around 12% of the original run.
The overhead of processing one chunk like this did make me then wonder about how I was defining the chunk size, and whether I should look at the current default work unit size. I played a little with reducing it to encourage more parallelism, but that only seemed to make things worse, as the rasterization overheads kicked in, and given paper deadline, I didnāt really have the time to try explore that space nor work out how to automatically infer what might be reasonable, so I had to park that. I also tried another, larger dataset, processing all 1600 odd mammals from the STAR metric, and this also gave me mixed results performance wise, and I didnāt have time to dig into that: I assume the speciesā range distribution was different from my normal test sample set.
Ultimately, on average the parallel save feature on Yirgacheffe does better than not having it, but is pretty poor given how many CPU cores it can use, and so overall Iām left quite unhappy with the feature. I feel that even allowing for Python related problems, something better could be done, but there was no time to look before the deadline passed š
Itās not like this was even a critical part of the narrative to the paper, and isnāt a feature I use that much, but the process made me realise thereās something going wrong and I donāt understand why, and I donāt have time to figure it out, and that is deeply frustrating.
LIFE
I started generating a new LIFE run using the latest RedList update from 2025. All the LIFE paper work was done with RedList data from when the project started in 2023, and thereās now a 2025 update out, so we want to publish updated layers. I did a visual inspection of the new maps, and thereās some differences, particularly around amphibians, but they generally look good, but Iāve passed them over to Alison who as a zoologist is actually capable in interpreting the results properly.
Whilst doing this Iām also doing a little modernisation of the code, and changing the default results you get when you use the script that comes with the repo so that it just runs things weāre still interested in, rather than everything that was in the original LIFE paper.
Claudius
Shreya, the Outreachy intern working on Claudius has been working for the last few weeks on getting a feature to record animations out to an animated-GIF file, and thatās now merged. Iād include an example here, but my self-written website publishing tool doesnāt have a way to let me include it, so Iāll try fix that for next week 𤦠We made some progress to getting Claudius into opam, as I got the OCaml-GIF library that it depends on that we maintain into opam.
The next challenge will be getting Claudius in, as the obvious paths donāt quite work due to Claudius using a submodule to add a resource dependancy. Specifically, github releases donāt include submodules in the produced tarball, which means Claudius wonāt build from a github release unfortunately, which is how I did the release for the GIF library.
3D-Printing maps
UROP studently Finley started, and impressed me by very quickly getting up and running generating models for 3D printing from digital elevation maps:

Finley is going to try write up some weeknotes, so Iāll link to those here as and when and not spoil his work, but Iām super excited about what we might get done this summer. I was working out of DoES Liverpool for part of last week, and I did spot this lovely CNC-routed landscape and I must resist trying to derail this project into even more time-consuming construction methods :)

I did find out the computer lab has some Prusa 3D-printers, so hopefully Finley and I can get trained on those.
This week
- Make sure we have everything we need for the next LIFE manuscript ready for zenodo.
- Get some of Finleyās results 3D-printed and try get him able to print on his own.
- Try to schedule a meeting on AoH validation with interested peeps. This was discussed around the IUCN workshop a few weeks back, and I need to try arrange that before people vanish for summer holidays (myself included).
- Look into TESSERA if thereās any free time