12 Dec 2025 — 6 min read
Treat your disk storage as potentially hostile
A write-ahead log (WAL) is one of those database concepts that sounds deceptively simple. You write a record to disk before applying it to your in-memory state. If you crash, you replay the log and recover. Done.
Except your disk is lying to you.
PostgreSQL, SQLite, RocksDB, Cassandra... every production system that claims to be durable relies on a WAL. It’s the fundamental contract: "Write here, and I promise your data survives." But making that promise actually stick requires understanding all the ways disk fail silently.
The Naive Approach vs Reality
Let’s say you implement a WAL like this:
write(fd, record, sizeof(record)); // Done, right... RIGHT?
In a test environment on your laptop, this works great. But when you handle millions of writes a day, those 1 in a million errors happen multiple times a day. Some of these systems will fail in ways your tests never catch:
- The page cache problem: That
write()just copied your data into the kernel’s buffer. It hasn’t touched the disk, yet. Crash now, and it’s gone. - The disk that lies about success: Your
write()returns success. The kernel tells you it’s synced. The disk firmware tells you it’s on stable storage. Then a latent sector error silently corrupts it anyway. - The ordering chaos: Write operation A starts. Write operation B starts. B completes first. Your recovery code sees B without A and has no idea what happened.
- The single point of failure: One bad sector on your only copy of the WAL, and you lose everything.
This is why people who’ve lost data in production are paranoid about durability. And rightfully so.
Building the Better Mousetrap
There are 5 layers of defense that we can use to build a better mouse trap. Think of them as increasingly specific answers to the question: "How can this fail?"
Layer 1: Checksums (CRC32C)
Every record includes a checksum of its contents. After writing, we verify the checksum hasn’t changed. Simple, right?
Record Header (20 bytes):
[magic_num: 4][sequnce_num: 8][checksum: 4]
[payload: variable]
[padding to 512 byte alignment]
Why this matters: Hardware bit flips happen. Disk firmware corrupts data. Memory busses misbehave. And here’s the kicker: None of these trigger an error flag. The I/O subsystem returns success. The data is just silently wrong. Without checksums, you discover this weeks later when you try and recover and find your log is garbage.
Layer 2: Dual WAL Files (LSE Protection)
Another solid strategy to help protect against a specific kind of failure: a latent sector error (LSE), is to keep two WAL files, ideally on different disks.