rqlite is a lightweight, user-friendly, distributed relational database. It’s written in Go, employs Raft for distributed consensus, and uses SQLite as its storage engine.
The newly released rqlite 9.2.0 introduces a major improvement to startup performance – nodes can now resume from where they left off, instead of rebuilding their state from scratch on every restart. This change means that even if a node manages gigabytes of SQLite data, it can come back online almost instantly, with startup time no longer proportional to dataset size.
In this post, I’ll explore…
rqlite is a lightweight, user-friendly, distributed relational database. It’s written in Go, employs Raft for distributed consensus, and uses SQLite as its storage engine.
The newly released rqlite 9.2.0 introduces a major improvement to startup performance – nodes can now resume from where they left off, instead of rebuilding their state from scratch on every restart. This change means that even if a node manages gigabytes of SQLite data, it can come back online almost instantly, with startup time no longer proportional to dataset size.
In this post, I’ll explore why this change matters, how it was implemented, and what it says about rqlite’s evolution.
From rebuilding to resuming
Since its inception over a decade ago, rqlite took a very conservative approach to correctness on restart. Each restart discarded the local SQLite database and rebuilt state by replaying the Raft log or applying a snapshot.
This “always rebuild” strategy ensured the node started in a guaranteed-correct state, free from any potential corruption that might have occurred during a previous run. And doing so meant rqlite could take advantage of disabling SQLite SYNCHRONOUS, normally risky, but which resulted in great write performance.
It was simple and robust – but as rqlite clusters managed more data over the years, the cost of that simplicity became apparent. Restarting a node with hundreds of megabytes or even gigabytes of data could take minutes, as the node laboriously reconstructed the SQLite database from the log.
I’ve now removed that long-standing behavior. A node no longer rebuilds from scratch – it resumes. This is the most significant change to rqlite’s architecture since it switched to WAL mode. It fundamentally alters how a node comes back online after a shutdown or crash. Instead of blindly throwing away the existing SQLite file, rqlite will try to pick up right where it left off. The result is dramatic – startup times drop from minutes to seconds when working with multi-gigabyte databases. A rqlite node has essentially learned to wake up remembering its state, rather than reconstructing it from the ground up.
How rqlite ensures a safe resume
You might wonder: how can rqlite skip the rebuild safely, given that it historically ran SQLite in a mode that doesn’t fully guarantee on-disk durability? The answer lies in a careful balance between performance and safety, implemented through SQLite’s synchronization settings and some additional metadata.
High-Speed WAL Mode with Periodic fsync: Normally rqlite runs SQLite in WAL mode with SYNCHRONOUS=OFF for maximum write throughput. This speeds up inserts and updates by avoiding blocking fsync calls on each transaction, but it carries a risk – if the OS crashes, the SQLite database might not be fully flushed to disk, potentially leaving it in an inconsistent state. To mitigate this, rqlite 9.2 periodically performs a full fsync of the SQLite database. Specifically, whenever rqlite takes a snapshot of the Raft state, it temporarily switches SQLite to SYNCHRONOUS=FULL, checkpoints the WAL, and flushes all data to disk. This ensures that the SQLite file on disk represents a fully consistent checkpoint of the database at that point in time. After the snapshot is done, rqlite immediately switches SQLite back to SYNCHRONOUS=OFF mode to keep write performance high.
Each successful snapshot now guarantees a durable on-disk state. In fact, during a clean shutdown, rqlite will proactively trigger a final snapshot so that the database file is completely synced before the process exits. In short, rqlite’s use of WAL + SYNCHRONOUS=OFF writes gives great performance, and periodic sync points (snapshots) give a safety net.
Writes alter the WAL, never the main database: When snapshotting completes, write requests can be serviced again. However, once write requests reach consensus via Raft, they are stored in the SQLite WAL file as before. The main SQLite file, which has been safely fsynced to disk, is never touched until the next snapshot.
Recording a “Clean Snapshot” Fingerprint: How does rqlite know on restart that an existing SQLite file is safe to use? The trick is a small metadata file called a clean snapshot marker. Whenever a Raft snapshot completes, rqlite writes out a fingerprint of the SQLite database at that moment. This fingerprint (stored as a JSON file on disk) contains the database file’s last modification timestamp, size, and its CRC32. Writing this file is reliable – it’s fsync’ed to disk as well, so rqlite knows that if the file exists, it accurately reflects a synced state of the DB.
On node startup, rqlite 9.2 performs a check: if there are any snapshots available, and this clean snapshot marker file is found, rqlite reads the expected modification time, size, and CRC32 from the file. It then checks those same attributes of the SQLite database file’s on disk. If they all match exactly, then it knows that the SQLite file was the one produced by the last successful snapshot. In this case, the node can trust the on-disk SQLite database and skip the usual restore from the Raft Snapshot.
rqlite then simply opens the existing SQLite file and resumes normal operations. Any Raft logs from after any snapshot are applied to the database (they will actually be written to the WAL). The end result is node is ready much faster than before.
If for some reason this check fails – say the marker file is missing, or the CRC doesn’t match (meaning that the SQLite database may have been checkpointed but the Raft Snapshot operation failed to complete) – then rqlite falls back to the old behavior. It will treat the existing file as suspect, delete all existing SQLite state, and restore it by applying the latest known-good Raft snapshot or replaying the log from scratch. This approach means rqlite never risks starting from a potentially inconsistent database. It will only resume from disk when it’s confident the file is intact and current; otherwise, correctness takes priority and it rebuilds the state the slow way. In practice, with 9.2’s changes, the slow path is rarely needed except after an abrupt crash in very specific and unlikely circumstances.
What is the result?
The impact of this change is immediately noticeable. Startup times are now independent of data size – whether your node has 10 MB or 10 GB of data, a restart will be on the order of a second or two (basically the time to open the SQLite file), rather than scaling with the amount of data. Previously, a multi-gigabyte dataset could mean many thousands of Raft log entries to replay or a huge snapshot to restore, leading to start times that could stretch into minutes. Now, as long as the node has a recent fsynced SQLite file, it just opens it and is ready to serve requests immediately. This reduces downtime for maintenance restarts or node reboots in production environments. Your rqlite cluster can come back from upgrades or reconfiguration much more quickly, improving overall availability.
To give a sense of the improvement: in one test, a rqlite node managing ~5 GB of SQLite tables used to take over a minute to fully come online after a restart. With rqlite 9.2, that same node restarts and begins serving reads in under a second. The only delay is opening the database file and verifying the snapshot fingerprint – a constant-time operation that doesn’t grow with the data. Smaller datasets that might have taken, say, 30 seconds to recover now feel almost instantaneous.
But doesn’t a larger file mean more time to check the CRC32?
There’s one small but important detail. When a node starts up, it first checks the SQLite file’s modification time and size. That part happens synchronously. The CRC32 check, though, runs in a separate Goroutine. If the modification time and size look right, rqlite assumes the database is good and starts serving reads and writes right away. A few seconds later, the CRC32 result comes in. If it matches, nothing more happens. If it doesn’t, the process exits alerting the operator to a problem. This is safe because any new writes during those few seconds live in the Raft log and SQLite WAL, not in the main database file. And since rqlite always deletes any WAL file at startup, exiting here is fine — the node can replay those writes from the Raft log when it restarts, probably in combination with a full restore from Raft next time round.
It’s an obvious change, right?
A fair question at this point is, why only now? If the solution is to periodically fsync and check a file’s state, why did it take over a decade to implement what sounds like a straightforward optimization? The answer, like rqlite itself, is about prioritizing correctness and the long, slow evolution of a stable system.
For years, the “always rebuild” strategy was the right one. It was simple, robust, and provably correct. The Raft log was the single source of truth, and the on-disk SQLite file was just a disposable cache. This approach eliminated an entire class of potential bugs, and in a database, correctness is the one thing you can never compromise.
Introducing a “fast resume” path meant fundamentally changing that model—it meant trusting the SQLite file. That’s a change I don’t take lightly. As rqlite isn’t driven by commercial pressures, I had the freedom to let this idea percolate for months, even years. I could think through every edge case, often ruling out entire design ideas before a single line of code was written.
Part of the delay was also a search for a more general, “perfect” solution—one that would work in all circumstances, not just the “clean shutdown” case. It took years to truly understand how the system was evolving, learning the deep nuances of SQLite’s WAL behavior under real-world situation. It also took time to become convinced that a pragmatic solution—one that optimizes for the common case (99% of restarts) while falling back to the old, safe method for the exceptional case (an abrupt crash)—was the right trade-off.
But the most important reason: such a fundamental change could only be made once the rest of the system was rock-solid. This new feature rests on a foundation of 12 years of work. It depends on a battle-tested Raft implementation, a trusted snapshotting and log-truncation process, and a reliable recovery mechanism. Waiting this long meant that when this new logic was finally added, it was landing on mature, deeply stable software. It’s an evolution, not a revolution, and that’s what gives me confidence that it’s just as correct as the old way, only much, much faster.
This change feels like another turning point for rqlite. The project began as a experiment in distributed systems – essentially a demonstration that you could add Raft consensus to SQLite and get a fault-tolerant, consistent database. Over the years it grew in features and stability, but that original “always rebuild” approach was a vestige of its experimental origins. Now rqlite has matured to the point where it can wake up remembering its state rather than reconstructing it from scratch. It’s a small conceptual step, but one that signals a new level of practicality for the system.
The most obvious ideas sometimes arrive late, but when they do, they highlight how much groundwork was needed to make them possible.
Next steps – and the possible path to rqlite 10
With rqlite 9.2, operational life gets a bit easier for anyone running rqlite – especially those with large datasets. Restarts are no longer an ordeal or something to dread in your maintenance window. You can upgrade nodes or bounce a cluster member and have confidence it will be back in the mix almost immediately. All this comes without compromising rqlite’s core promise of correctness. If a node can’t guarantee its on-disk state is perfect, it will simply do what it always did and recover from the canonical Raft log. But in the common case, rqlite now combines the performance of in-place restarts with the safety of Raft’s consistency guarantees.
If you’re upgrading from an earlier version, the transition is seamless – just update to 9.2, and the next time each node restarts, you’ll notice the difference. There’s nothing new to configure; the feature is automatic.
A closing thought. This work is making the path much clearer to a time when only the SQLite file, a WAL file, and a Raft log exists — there will be no need for a second copy of the database in the Raft Snapshot store, and disk requirements will drop by half. But, like all major changes to rqlite, that design will take time — time to mature, time to validate, and time to develop. But the way ahead is much clearer now.