- 2025-11-02 *
I’ve finally restarted work on the 32KB version of [chromatite] for the first time since April!
One of the first things I wanted to do was to replace the brute-force O(n^2) IMDCT that Pulsejet uses to decompress samples. [chromatite] uses 11 seconds of heavily crunched samples, which takes about 20 seconds to decompress – under the 30-second limit for Revision ’25, but annoyingly long when iterating on the …
- 2025-11-02 *
I’ve finally restarted work on the 32KB version of [chromatite] for the first time since April!
One of the first things I wanted to do was to replace the brute-force O(n^2) IMDCT that Pulsejet uses to decompress samples. [chromatite] uses 11 seconds of heavily crunched samples, which takes about 20 seconds to decompress – under the 30-second limit for Revision ’25, but annoyingly long when iterating on the code (and even worse in debug mode). And I’ve got space; [chromatite] was at 30308 out of 32768 bytes, 8 KB of which I can quickly jettison if I need to since it’s the side channel for a ride cymbal that doesn’t really need to be stereo.
So I spent a few evenings this last week implementing an O(n log n) IMDCT; just the smallest way I could find, using Bosi, Goldberg, and Charligione’s algorithm using this reference. That requires an FFT, which I used the old-fashioned Cooley-Tukey recursive algorithm for – in this context I know my stack size and maximum FFT size, so it doesn’t need to be that fast, just small. (It’s also useful to have around since I’d like to use it for a future demo: [chromatite] only uses order-1 and order-2 causal filters since I didn’t want to deal with latency compensation, but someday I’d like to flip that on its head and make a demo that mostly does Fourier transform effects.) I was quite happy to get my implementation down to +396 bytes and running 8x as fast as the previous decompressor!
Then I saw this issue and realized: oh... a basic ‘compute the coefficients and store them in a table’ is probably just as good (despite being O(n^2), since here n=128 or 1024 and if I need performance it’s easier for me to vectorize the matrix multiply). And, yep: about an hour later it turns out it’s only +92 bytes, 2.5x faster than the FFT IMDCT for these sizes, and probably runs better in debug mode as well. It’s a bit unfortunate since I liked the clever FFT approach, but the constant factor won again this time.
Now I get to work on the music side of things again! Just a few more things to do – the main puzzles are to figure out why Phase Plant’s pulses have different timbres than my code for the pluck synth, and to figure out the right foldback curve and dynamics for the distortion, then adding the few missing tracks, balancing all the levels, and it’s done!