Scaling Identity: How Coinbase Serves 1.5M Reads/Second

By Congwei Wu, Abdullah Arif, Prashanth Basappa, Jin Wang

Coinbase’s mission to build the cryptoeconomy rests on a foundation of trust. At the heart of that trust is the Identity Platform Engineering group, the team responsible for the core of a user’s identity: login, multi-factor authentication (MFA), security settings, and account management.

The Accounts team builds the services that unify user management across all Coinbase products. At the center of this is the **Users service **— a common data model and API that is a hard dependency for nearly every critical user journey (CUJ).

That reality sets an uncompromising bar for performance. The service must be highly scalable, extensible, durable, and resilient. This is the story of how we engineered it to handle the extreme, …

By Congwei Wu, Abdullah Arif, Prashanth Basappa, Jin Wang

That reality sets an uncompromising bar for performance. The service must be highly scalable, extensible, durable, and resilient. This is the story of how we engineered it to handle the extreme, bursty traffic of the crypto market, serving over 1.5 million reads per second at its peak.

Why Scale and Resilience are Hard in Crypto

Crypto traffic is unlike any other. It’s unpredictable, highly bursty, and driven by market volatility. A single market event can trigger a massive surge that propagates through deep call stacks.

The Users service sits near the bottom of many of these stacks, meaning it experiences amplified read traffic during peaks. While writes are comparatively low, write correctness under high concurrency is non-negotiable for account safety and authorization flows.

Challenges

Scalability: Sustain multi-million RPS read traffic with consistent, low latency under bursty load.

Write Correctness: Avoid race conditions when multiple services try to update shared fields (e.g., admin flags) simultaneously.

Extensibility: Support 200+ developers building 50+ product flows via APIs and events without breaking consistency or performance.

Operability: Maintain high availability SLOs and have predictable fallbacks when dependencies degrade.

Architecture

Data Modeling with “Fragments”

The Users service offers a CRUD-like API for user data. To manage complexity and optimize throughput, we organize related fields into distinct “Fragments.”

Instead of forcing clients to parse a monolithic object with over 300 fields, Fragments (like Addresses, PhoneNumbers, or TaxID) group related data. This streamlines client interactions, as they only engage with the data they need. It also gives us granular control over access and permissions for each field group.

Fig 1: Fragment data modeling and relationships

From Monolith to Fragment APIs

We previously had a large, monolithic

GetUser

API that allowed clients to fetch any data they wanted. As Coinbase grew, this API became unmaintainable. It created unnecessary coupling between domains and made observability a nightmare — a slowdown in one small fragment could slow down the entire request.

We transitioned to Fragment APIs, where each fragment is served by an independent API. This reduced coupling, but more importantly, it allowed us to scale each fragment independently. Each fragment is backed by its own data store, so a high-load service like PhoneNumbers can be scaled without impacting the

UserAgreements

service.

Event Streaming & Offline Data

For use cases that need push-based updates, we offer event streaming via database CDC (Change Data Capture), which we call Fragment Events. Crucially, these events expose the exact same data models as our APIs. This allows clients to integrate seamlessly with either the API or our event streams without having to learn and interpret two different data models. For offline analytics, we integrate with Snowflake, providing an hourly ingestion lag for large-scale data processing.

Storage: A Polyglot Persistence Strategy

Caching: From Sidecar to Native

Historically, Coinbase built an internal caching layer called Cache Router. It’s a Go application that runs as a sidecar attached to every Kubernetes pod. Cache Router handles replication, jitter, and connection pooling for the Memcache cluster. It’s highly latency sensitive: cache misses at this layer can cause surges of writes to hit the backing Database.

We have since migrated to ValKey (an open-source fork of Redis) managed by AWS. We now use native connections from our API pods without any sidecars. This move simplified our architecture, removed a layer of operational overhead, and eliminated the fine-tuning a custom sidecar required.

Database: MongoDB + DynamoDB

The Users Service originally relied on MongoDB, which offered great flexibility and low latency. However, during rapid, unpredictable traffic surges, some workflows would bypass the cache and hit the database directly. This sometimes led to “connection storms” that could overwhelm the cluster and cause brief unavailability.

To solve this, we moved to a federated, or polyglot, persistence model. We analyzed our access patterns and moved our most frequently accessed, key-value (K-V) lookup datasets to DynamoDB. As a stateless database that doesn’t require active connections, DynamoDB is purpose-built to handle unpredictable traffic spikes with consistent performance.

One challenge with DynamoDB is its lack of native unique constraints. We solved this by developing a framework that uses an auxiliary table and multi-item transactions to enforce atomicity when unique constraints are required on one or more columns.

This federated setup, using both MongoDB and DynamoDB, allows us to use the right tool for the right job, managing diverse workloads while maintaining flexibility.

Fig 2: Database topology

The Consistency Model

We use two complementary mechanisms to balance massive scale with data correctness.

Optimistic Concurrency Control (OCC)

While our service is read-heavy, multiple systems may try to write to the same user record concurrently (e.g., toggling admin flags). To prevent lost updates without using slow, global locks, we implement an OCC protocol:

Each user resource carries a version field.

When clients write, they read the current version and include it in their write request (using If-Match semantics).

The service performs a compare-and-swap on the version during the write. If it mismatches, it returns a retryable conflict error.

Clients retry with exponential backoff.

This approach avoids global locks, maintains correctness, and scales with the number of writers.

“Freshness Tokens” for Read-After-Write

Caching is great for latency, but it can weaken immediate consistency. To provide strong read-after-write guarantees for critical flows (like a user updating their password and then immediately trying to log in), we use Freshness Tokens.

A Freshness Token is an opaque identifier bound to a specific resource version. When a client performs a successful write, it receives a Freshness Token. If it needs to read that data back immediately, it includes the token in its read request. The service can then guarantee it returns data at least as fresh as that token, bypassing the cache if necessary.

In practice, a Freshness Token is simply a compact proxy for a point on the resource’s version timeline, giving us on-demand, surgical consistency without penalizing the performance of the hot path.

Fig 3: Freshness tokens return the “last known version” data

Dealing with Overload

One of the most complex scaling challenges for the Users Service was reliably handling persistent overload, which frequently led to a MongoDB death loop. This loop was characterized by the system attempting to establish connections and make requests to MongoDB. On success, it exited. On failure, however, it dynamically adjusted by either increasing connections (if under a certain threshold), retrying requests, or dropping connections to shed load. Crucially, failures from either state — successful or failed connections — triggered an increase in MongoDB load capacity.

This failure pattern was most likely to occur during peak traffic, precisely when service reliability is most critical, and once triggered, it would bring the entire service down.

Load Tests

To get ahead of this failure mode, we scheduled recurring load tests that pushed the service past expected organic peaks. We intentionally drove the system to its breaking point to validate its behavior under stress and verify that it degrades gracefully rather than collapsing.

Load Shedding

To prevent a potential “death loop” under heavy load, we introduced a dedicated load-shedding mechanism. This system uses a worker that continuously monitors the health of the Users Service. When the service detects an overload, the worker dynamically adjusts the Application Load Balancer (ALB) to divert a portion of the incoming traffic to a dummy target group.

This mechanism makes load testing safer for our service and enables Users Service to degrade gracefully.

Fig 4: Load shedding safeguards against service overload

The Results: Scale and Stability

By moving from a monolithic service to a federated system of Fragment APIs, federated storage, and intelligent consistency controls, the Users service now sustains over 1.5 million reads per second during market surges.

Fig 5: Over 1.5M RPS at Peak during a Central-Load-Test

This architecture gives us the scalability to handle the crypto market’s massive, unpredictable peaks and the resilience to be a service that Coinbase can build on with trust.