Using CRDTs and Sync as a Database

Can you use a CRDT as a Database?

I’ve been thinking about this a lot, and my answer is a resounding…

Maybe

Depends on your data, and your use case.

If your data is a good fit, and you’re ready to take on the quirks associated with doing so, I’m working on a batteries-included, decentralized DB that will store/track history/sync/query/authenticate/encrypt for you. Choose Eidetica as your database, and get all of that for free.

Let me explain.

Also to be clear, Eidetica is AGPLv3 licensed (see my thoughts on licensing) and I’m not ready to accept outside contributions yet but will soon. I will hopefully offer it under a Commercial License in the future but it will …

Can you use a CRDT as a Database?

I’ve been thinking about this a lot, and my answer is a resounding…

Maybe

Depends on your data, and your use case.

Let me explain.

Also also, it’s proof-of-concept, experimental and no backwards compatibility guaranteed for now. It is definitely pre-Alpha quality. Some of the purported features here are mild overstatements of the current capabilities. If you use it in prod in the near future I will laugh at you.

What is a CRDT? Link to heading

Conflict-free Replicated Data Type is a type of data structure that you can merge without conflicts. In essence it lets you and I separately create updates to a base set of data, and our changes can be deterministically merged together without failure.

A write to the CRDT can happen without confirming that it’s the only write happening at once. This is the key difference between distributed and decentralized systems:

Distributed systems (like traditional distributed databases with Raft or Paxos) require coordination before accepting writes. When you write, the system must confirm with other nodes to maintain consistency, which means you need network connectivity and consensus.
Decentralized systems (using CRDTs) allow each node to accept writes independently without coordination. You can write to your local copy while completely offline, and when nodes eventually reconnect, the CRDT merge algorithm automatically reconciles all changes deterministically.

This “write locally, sync later” property is what makes CRDTs powerful for local-first applications.

It can be as simple as coming up with a deterministic order to the changes and then taking the Last Write Wins (LWW). Technically that results in no conflicts being possible, but there can be logical errors that result. For example, with naive LWW: Device A sets status = "completed" and assigned_to = null, while Device B sets assigned_to = "Bob". You might end up with a completed task still assigned to someone.

Simplified CRDT Link to heading

For the purposes of using it inside of Eidetica’s syncing framework, a CRDT needs to have the following properties:

Merging in the same order is deterministic and results in identical data.1
Merging 2 CRDTs can’t fail, it is always valid.

That’s all that Eidetica relies on.

Eidetica: CRDTs + Everything Else Link to heading

CRDT libraries like Yjs and Automerge provide CRDT implementations but they’re only the data format, not complete systems. While there are sync engines written to on top of these like Automerge Repo, most only work with a single CRDT type.

Eidetica is generic over CRDTs. You can use the built-in simple CRDTs (like a LWW table or document store), plug in Yjs or Automerge types, or write your own. Eidetica doesn’t care what CRDT you use.

What Eidetica provides is everything else you need to build a database:

Sync protocol: Uses Iroh for peer-to-peer networking
Authentication: Decentralized identity without a central authority
Transactions: Atomic updates across multiple stores/tables
Storage backends: Persistent storage with history tracking
Object storage: For large binary data (separate from CRDT state)

Think of it as: CRDT Library + Sync + Auth + Storage + Transactions = Eidetica

CRDT as a DB Link to heading

CRDTs work well for human-generated data in modest amounts per user. They don’t work well for high-volume writes from many nodes.

For the data most precious to you - private notes, todo/contacts/calendar, communication, etc - this model should map pretty well. A chat app, for example, can easily be built on top of this without much effort. I’ve written a small, TUI chat application to demonstrate.

Note: The underlying format in Eidetica is quite similar to Matrix and will have similar horizontal scaling problems. Matrix has included a lot of additional niceties for a chat service that make scaling for them even more difficult though.

I specifically designed things this way because it’s how I want my personal data to be stored. I want my data provably verified, End-to-End encrypted, and available on all my devices even if I’m offline. I hate showing up somewhere and having no access to my notes, todo list, rss feed, or chat history because the internet is down. Apps should sync in the background whenever my devices are on.

CRDT State vs. Object Storage Link to heading

Not all data should live inside the CRDT itself though.

The data stored inside the CRDT type should be relatively small since the system stores the full history and signs each commit. For large or binary data like images, videos, or large documents, it’s more efficient to store a hash reference in the CRDT and keep the actual object outside the CRDT state.

For example, in a note-taking app:

Store in CRDT: note title, tags, creation date, description, and the text content.
Store as Object: attachments, images, videos, etc.

This way you can garbage collect and deduplicate objects independently from the CRDT history, which can’t easily be modified or pruned. The CRDT gives you the structural data and references, while the object store handles the heavy content.

In Eidetica, Object Sync will be a separate storage layer with different configuration options but is not yet implemented.

Is This Correct? Link to heading

Personally I think that a whole lot of data can be modeled as a “CRDT”. I’m taking a bet on that by working on Eidetica to build out exactly the storage/sync system that I personally want to have.

This is the type of thing that I think allows for a more local-first world. Moving things out of centralized, controlled data and into the hands of users. Each user should own and End-to-End encrypt all of their own data so that no government can monitor you, and no corp can sell ads off your data unless you let them.

And you don’t have to store your data on Google/Meta/Signal’s servers just to have some private chats, or share private data between yourself and your friends. From a purely technical standpoint that is unnecessary. It’s just tricky to get the support and user flows correct, and there’s probably not much money in it.

I will fully admit that it does not work for every type of data, but I do think it applies to more areas than you think.

Is Eidetica For You? Link to heading

So my pitch is this: if you are an app developer, and you can store your data in a format where you can guarantee that you can always “merge” data, and your data storage needs PER USER are relatively modest, then Eidetica can/will provide sync, authentication, encryption (TODO), etc. You don’t need a central server, but you can run my sync server yourself to store/sync your user’s data. You will eventually be able to sync objects as well (TODO).

A user will also be able to run Eidetica locally as a service to store the data from all their local apps using Eidetica and sync in the background.

If you choose to store your data in Eidetica, it gives the User full control over their own data, they can choose how much or how little to store locally, store it on their own servers, rent space on someone else’s Eidetica service for backups, etc.

It is exactly how I want to store my own personal data. I built it for myself first, but I hope it can be useful to others as well.

Some CRDTs don’t care about needing identical order and only need to include the same edits to have a final deterministic state. In effect these are internally managing the state updates and not using a LWW algorithm. From Eidetica’s perspective we could eventually reduce the cost of our state computation and let the CRDT structure handle it so we don’t do it twice, but for now is an unused property. ↩︎