Scaling a stateless system is relatively straightforward. We can horizontally scale application servers by adding more instances behind a load balancer, and the system continues to work as expected.
But what happens to the database?
How can a database running on a single machine handle requests coming from N horizontally scaled application servers? Databases must scale too — and this is where things get interesting.
The First Idea: Replication A natural first thought is:
“Let’s just replicate the database and create multiple copies.”
At first glance, this sounds reasonable. But the moment we try to scale traditional SQL databases using replication, we run into fundamental constraints.
Why SQL Databases Don’t Scale Easily SQL databases are built around powerful guarantees:
Join…
Scaling a stateless system is relatively straightforward. We can horizontally scale application servers by adding more instances behind a load balancer, and the system continues to work as expected.
But what happens to the database?
How can a database running on a single machine handle requests coming from N horizontally scaled application servers? Databases must scale too — and this is where things get interesting.
The First Idea: Replication A natural first thought is:
“Let’s just replicate the database and create multiple copies.”
At first glance, this sounds reasonable. But the moment we try to scale traditional SQL databases using replication, we run into fundamental constraints.
Why SQL Databases Don’t Scale Easily SQL databases are built around powerful guarantees:
Joins across tables Cross-row and multi-table transactions Global constraints (foreign keys, unique keys, auto-increment IDs) All of these assume one critical thing:
All data lives in one logical place.
Once we attempt to shard or distribute SQL data:
Joins turn into cross-network calls Transactions span machines Locks stop working globally As a result:
Complexity explodes Performance degrades Application logic becomes significantly more complex This doesn’t mean SQL cannot be sharded — it can — but doing so safely requires heavy coordination and careful design.
The Leader–Follower Model To improve scalability, many systems move to a leader–follower replication model.
Press enter or click to view image in full size
How It Works A single leader (primary) accepts all writes Followers (replicas) copy data from the leader Reads may be served from followers to scale read traffic This gives us:
One source of truth for writes Multiple copies for read scaling and fault tolerance But a key question remains:
How does the leader propagate writes to followers?
The answer lies in replication strategies.
Replication Strategies in Leader–Follower Systems
- Strong Consistency (Synchronous Replication) In this model:
The leader writes data locally The leader waits for acknowledgements from a quorum (majority) of replicas Only then does it respond to the client This ensures:
Strong consistency No stale reads (when reads also use quorum or leader) Tradeoffs:
Higher write latency Reduced availability if replicas are slow or unavailable Importantly, strong consistency does not require waiting for all followers — waiting for a majority is sufficient.
- Eventual Consistency (Asynchronous Replication) Here:
The leader writes data locally It sends replication messages to followers It responds to the client without waiting for most acknowledgements This model:
Improves write latency and throughput Allows reads from followers to return stale data temporarily Relies on the system eventually converging This is a deliberate tradeoff:
Availability and performance over immediate consistency
The Leader Bottleneck Problem Even with replication, a fundamental limitation remains:
All writes still go through a single leader.
As write QPS grows (e.g., 10K+ writes per second):
The leader becomes a bottleneck CPU, I/O, and replication overhead increase Vertical scaling eventually hits limits At this point, simply adding more replicas doesn’t help — because replicas don’t accept writes.
This is where partitioning (sharding) becomes necessary.
The Entry of NoSQL
NoSQL databases emerged to solve exactly this problem: scaling writes and reads horizontally at massive scale. Key motivations behind NoSQL systems:
High write throughput High availability Fault tolerance Geo-distribution Low latency at scale Rather than optimizing purely for normalization and storage efficiency, NoSQL systems are designed to scale by default.
Sharding in NoSQL Databases NoSQL databases distribute data using sharding.
How Sharding Works A partition (shard) key is chosen (e.g., user_id) The key is hashed or range-mapped Data is distributed across shards Each shard owns a mutually exclusive subset of data Common partitioning strategies:
Consistent hashing Range-based partitioning (A–G, H–M, etc.) Choosing the correct shard key is critical:
A poor shard key can cause hotspots and destroy scalability.
Replication Inside a Shard Each shard is still replicated for fault tolerance. This is where two NoSQL replication models appear.
Model 1: Leader–Follower per Shard Used by systems like:
MongoDB DynamoDB HBase Per Shard: One leader Multiple followers Write Flow: Client routes request to shard leader Leader writes data Leader propagates to followers Leader waits for: Quorum acknowledgements (stronger consistency), or Minimal acknowledgements (eventual consistency) Read Flow: Reads may go to leader (fresh) Or followers (faster, possibly stale) This model is simple and efficient but still has:
Leader hotspots Leader failover complexity Model 2: Quorum-Based (Leaderless) Replication Used by systems like:
Cassandra Riak Dynamo-style designs Per Shard: No leader All replicas are equal Write Flow: Client sends write to any replica (coordinator) Coordinator sends write to all replicas Write succeeds when W replicas acknowledge Read Flow: Client reads from R replicas Latest version wins Inconsistencies are repaired in the background This model:
Avoids leader bottlenecks Improves availability Requires conflict resolution and versioning Quorum Explained (Simply)
Let:
N = total replicas per shard W = replicas required to acknowledge a write R = replicas required for a read Rules:
Strong consistency: R + W > N Eventual consistency: R + W ≤ N Examples:
Banking, inventory → strong consistency Social feeds, likes, analytics → eventual consistency Quorum decisions are made per shard, not globally.
Final Takeaway
SQL databases struggle with horizontal write scaling due to strong global guarantees Leader–follower replication improves read scaling but not write scaling NoSQL databases solve this by sharding data ownership Each shard scales independently Replication within shards is handled using leader-based or quorum-based models Consistency is a configurable tradeoff, not a fixed property Modern data systems don’t ask “Can we scale?” — they ask “What consistency are we willing to trade for scale?”