Temporal Dithering of NeoPixels on an ATtiny412

So, I had the thought, wouldn’t it be neat to have a physical version of the neurodiversity rainbow infinity symbol? Like a little animated thingy that could decorate your desk or maybe be worn as a badge. I couldn’t find anyone who had created such a thing yet, so I did it myself. If you want to build one, the board design, code, and extras are available on GitHub. If sourcing boards and components seems a bit involved, I might also have extras lying about. The board consists of 27 WS2812B addressable RGB LEDs driven by an ATtiny412. A button on the back cycles through various colors, gradients, animation speeds, patterns and brightness levels.

![A stack of lem…

![A stack of lemniscate PCBs](http://sarah.alroe.dk/assets/scal/2025-02-22 Stack.jpg)

One troublesome aspect of these addressable LEDs is their low color resolution at reasonable brightness levels. It would be nice to have soft color transitions without burning anyone’s retinas out. With digital images, a common strategy to increase perceived color resolution is to use dithering, patterning available colors to simulate in-between values. Neat idea, but only really works for displays of two or more dimensions with a rather high pixel density. We don’t have that here.

What we do have are pixels with a potentially high refresh rate (Depending on the number of pixels and whether the microcontroller can manage, which is the limiting factor here). If we can’t do spatial dithering, can we do temporal dithering instead? Apparently, LCD monitors do this, also known as frame rate control. So totally doable. But how?

Temporal dithering on highly limited compute

There are many different ways of dithering, nicely illustrated on the Dither Wikipedia article. Since, however, our colors are arranged in one dimension and seeking forwards (or backwards) in the sequence would be highly compute or memory intensive, the noted qualities of each method present quite differently here. For our purpose, the choice will either be between an ordered dither or error diffusion. While ordered dither does not need any knowledge of adjacent values, it would require values to have a known “temporal position”, which would then be used as index in a lookup table for comparison. Also I would then have to figure out an appropriate dither pattern to minimize clotting of on/off values, which seems rather complicated. In-writing update: Actually, it’s quite easy. I randomly stumbled upon a 1998 paper on it. The only worry then would be potential visible aliasing depending on how each pixel is offset. Maybe worth a try. The-week-after update: Yup it’s so much easier. See section at the bottom for more.

Alternatively, how easy would it be to do error diffusion? Quite: In simple terms, calculate colors in a higher resolution, output the most significant bits of the color value, store the ignored, least significant bits temporarily, then add this to the calculated color of the following cycle. Or in pseudocode for a single 8-bit color:

uint8_t red_error;

while (true){
uint16_t red = calculateColor();
red += red_error;
red_error = red & 0xFF;
outputColor(red >> 8);
}

In this way, the output flips between its two closest colors, with the proportionality of the actual value preserved (ignoring the nonlinearity of perceived brightness).

Implementing this on a microcontroller with only 256B of ram is a bit of a challenge, even when handling only 27 pixels. For the final functioning version, I ended up calculating colors in a 16-bit space, outputting in 8-bit, and packing the carried errors for each pixel into 8-bit RRRGGGBB color. In this way, adding dithering only requires 27 additional bytes of global variables. Check the repo for proper details on how it all works. The code is quite well commented imo.

Other neat aspects of the code I want to infodump about

Since all animations of the project are cyclical, I had initially calculated color positions as wrapping [0.0 - 1.0] floating point values. This did not work out as the ATtiny has no FPU, making framerates unusably slow, and filling half of the MCU flash with the float library. Instead, I remapped all position values to the span of a 16-bit unsigned integer, which is much faster to calculate. An additional benefit of this switch was no longer having to manually wrap values, as they now instead simply overflow back to 0 + remainder.

In the standard Adafruit NeoPixel library (and seemingly all its derivatives), gamma compensation is calculated using a 256-byte lookup table, precalculated from x2.6 [x = 0 .. 1]. Doing the same for a 16-bit integer space is not possible in the available flash space. Instead, here are three approximations of increasing accuracy and complexity:

uint16_t gamma(uint32_t value) {
// Normalized square
return (uint16_t)((value * value) >> 16);
}

uint16_t gamma(uint32_t value) {
// Normalized cube
return (uint16_t)((((value * value) >> 16) * value ) >> 16);
}

uint16_t gamma(uint32_t value) {
// Average of normalized square and square of square
uint32_t firstIteration = (value * value) >> 16;
uint32_t secondIteration = (firstIteration * firstIteration) >> 16;
return (uint16_t)((firstIteration + secondIteration) / 2); // Get average of the two
}

By default, the project uses the second function.

Using the digitalRead() Arduino function is surprisingly time intensive. Luckily, setting up interrupts with the megaTinyCore is relatively painless. In this way, the button input is never checked during regular operation, only after it has actually been checked. On a sidenote, the tinyAVR MCUs have Configurable Custom Logic and onboard comparators which the core also supports. Not used here, but pretty neat.

The megaTinyCore makes it possible to remove the standard Arduino millis() function to save a bit of flash space, which was necessary for this project. This means we can’t time animations or button interactions. Instead, the code relies on the consistency of the animation calculations. Again, due to the cyclic nature of the animations, this seems to work just fine. To keep animations consistent across CPU frequencies (lower clock, lower minimum voltage for battery power), speed constants are adjusted according to the F_CPU constant.

Update: Ordered dither is much simpler than error diffusion, but also worse

As noted above, ordered dither turns out to be much simpler to implement than error diffusion:

Create an 8-bit dither pattern. The previously mentioned paper has an algorithm for that. Here’s a python implementation.
Initialize an 8-bit counter, incrementing for each show().
On setting pixel color, lookup pattern[counter + offset], and check if less than the remainder of the output value (which is already the least significant 8 bits if using 16 bit values as before). If so, increment the value before output.

This makes for way simpler, smaller and (as far as I can tell) faster code. It also looks worse.

Even while offsetting individual values and pixels from the counter, it’s noticeably more flickery than the error diffusion method. My suspicion is that since colors are recalculated between each dither calculation, no single pattern is maintained for any length of time, introducing some form of aliasing. A potential option could be to only recalculate colors occasionally, increasing framerate and stability. This would however require storing all 3*27*2=162 pre-calculated bytes of color data. With the current MCU, this is not feasible.

Oh well. Worth a try.

Temporal dithering on highly limited compute

Other neat aspects of the code I want to infodump about

Update: Ordered dither is much simpler than error diffusion, but also worse

Similar Posts