Brilliant KV at scale. Painful for most business queries.
I shipped a SaaS on DynamoDB. From start to scale during many years. Using DynamoDB as the primary store was one of my worst engineering calls. It feels great in week one: low cost, serverless, fast, safe, replicated, console out of the box. Then reality hits. Most business apps need flexible queries and evolving schemas. DynamoDB punishes both.
Why it’s so tempting (and why that’s a trap)
DynamoDB shines on slides:
- No servers to babysit
- Auto-scaling throughput
- Rock-solid durability, low-latency key lookups
- Backups, TTL, Streams, global tables
And yes, AWS pushes it hard. It shows up everywhere in Amplify tutorials, keynote demos, “serverless by default” narratives, and—let’s be blunt—**RDS pricin…
Brilliant KV at scale. Painful for most business queries.
I shipped a SaaS on DynamoDB. From start to scale during many years. Using DynamoDB as the primary store was one of my worst engineering calls. It feels great in week one: low cost, serverless, fast, safe, replicated, console out of the box. Then reality hits. Most business apps need flexible queries and evolving schemas. DynamoDB punishes both.
Why it’s so tempting (and why that’s a trap)
DynamoDB shines on slides:
- No servers to babysit
- Auto-scaling throughput
- Rock-solid durability, low-latency key lookups
- Backups, TTL, Streams, global tables
And yes, AWS pushes it hard. It shows up everywhere in Amplify tutorials, keynote demos, “serverless by default” narratives, and—let’s be blunt—RDS pricing optics make DynamoDB look cheap at first glance. Don’t fall for it. Total cost of ownership for a real app with changing requirements tilts the other way.
Two core flaws that sink product teams
1) Weak querying for real business needs
Business apps rarely stop at “get by id.” They grow into multi-filter lists, admin dashboards, reports, exports, and “can we sort by X then Y?” asks.
With DynamoDB:
- You sort only within a partition.
- Filters happen after item selection.
- Cross-attribute predicates need GSIs, denormalized views, or both.
- Every new dimension risks a backfill, a new GSI, or bespoke glue.
What product asks for
-- Flexible list with three filters and stable multi-column sort
SELECT order_id, user_id, status, country, total, created_at
FROM orders
WHERE status IN ('paid', 'shipped')
AND country = 'FR'
AND created_at BETWEEN NOW() - INTERVAL '30 days' AND NOW()
ORDER BY created_at DESC, total DESC, order_id ASC
LIMIT 50 OFFSET 0;
Add or remove a filter? Change the sort priority? Still trivial in SQL.
DynamoDB reality
You’ll end up with a GSI on (status, created_at) (maybe per tenant), another index or a composite key to slice by country, and you still can’t do a global sort by created_at, total, order_id across partitions. You fake it by:
- Querying multiple indexes
- Merging results in memory
- Resorting client-side
- Re-paginating manually
- Handling holes/dupes across pages
// Sketch: 3 filters (status, country, time window) + multi-sort emulation
const statuses = ['paid', 'shipped'];
const since = Date.now() - 30 * 24 * 60 * 60 * 1000;
const queries = statuses.map(status => ddb.query({
TableName: 'orders',
IndexName: 'status-created_at', // PK=status, SK=created_at
KeyConditionExpression: '#s = :status AND #t BETWEEN :since AND :now',
FilterExpression: '#c = :country', // post-filter, costs reads
ExpressionAttributeNames: { '#s': 'status', '#t': 'created_at', '#c': 'country' },
ExpressionAttributeValues: {
':status': { S: status },
':since': { N: String(since) },
':now': { N: String(Date.now()) },
':country':{ S: 'FR' }
},
ScanIndexForward: false, // created_at DESC
Limit: 200 // overfetch to emulate secondary sort
}).promise());
const merged = (await Promise.all(queries)).flatMap(r => r.Items);
// Emulate ORDER BY created_at DESC, total DESC, order_id ASC
merged.sort((a, b) =>
(Number(b.created_at.N) - Number(a.created_at.N)) ||
(Number(b.total.N) - Number(a.total.N)) ||
(a.order_id.S.localeCompare(b.order_id.S))
);
const page = merged.slice(0, 50); // manual pagination
This is a nice workaround once. It becomes a maintenance swamp the moment someone says “also filter by payment_provider” or “sort by margin”.
2) Query vs Scan forces premature modeling and long-term rigidity
This is stated quite hard, you should use Scan in production very carefully and prefer Query in hot paths - for cost and speed matters.
DynamoDB makes you pick partition/sort keys and access patterns upfront. But real products don’t freeze their questions on day one. You end up:
- Over-engineering single-table designs before you have traction
- Backfilling GSIs when requirements change
- Fighting hot partitions and throughput tuning
- Paying in complexity every time you add a filter
In an RDBMS, you add an index and move on. In DynamoDB, you plan a migration, tweak streams, write backfills, and hope you didn’t miss a denormalized projection.
About AWS “workarounds” and their costs
You’ll hear: “Keep DynamoDB for writes, then sync to something query-friendly.”
- OpenSearch sync: $200-$1000 monthly cluster cost, index pipelines, mapping drift, cluster sizing, reindex pain, a new skillset to learn. Also another thing to break.
- RDS/Postgres sync: At that point, why not just use Postgres first? Dual-write or stream-ingest adds failure modes and ops overhead.
- Athena/Glue/S3 sync: Fine for batch analytics, not product queries. Latency, freshness, partitioning strategy, and scan-based pricing complicate everything.
Honnestly yes, these can work. They also eat the operational savings DynamoDB promised.
The single-table design mess
During my journey with ddb, I found this nested-pattern: the single-table design. Intended to simplify the query model and get more cost control.
The same author says multi-table can be the better fit in many apps — don’t cargo-cult single-table.
The core is as simple as “put many entity types in one table and prefix the PK with User#123, Product#456, etc.” pattern (single-table design) is pitched as how you “model relationships in DynamoDB.”
I would not recommand it and thankfully it has never reached my prod. Here’s what gets glossed over:
- Streams: Mixed firehose. Every consumer re-implements routing + type logic.
- Monitoring: Metrics blur across entity types. Hot keys and throttles are harder to triage.
- PITR/Backups/Restore: You can’t easily restore “just Orders for tenant X.” It’s all intertwined.
- Safety: One bad writer can pollute unrelated entities. Blast radius grows.
- Your model evolves: so optimising merge operations by hard-linking concepts again doesn’t sounds long-term to my ears.
Advantages? Fewer tables and slightly simpler IAM. Not worth the production pain.
Why DynamoDB Streams can go wrong
Streams look like a clean event bus to fan out data (CDC) to internal/external sinks and even trigger side-effects (emails, indexing).
In practice, they may bite:
- Hidden, unpredictable control flow
- Non-deterministic replays/restores
- Risky coupling of storage and compute
On table restores, item order shifts. Your sinks won’t see the same sequence—hello divergence in OpenSearch or a relational mirror. If the stream fires business side-effects (e.g., email on order creation), restores can wreak havoc: duplicates, email storms, false alerts.
And yes, some stream deliveries fail. Now you own desync reconciliation, backfills, and a new incident surface.
Bottom line: use streams for CDC and replayable side-effects only. Keep critical workflows in explicit services/queues you can control and re-run safely.
Skills, hiring, and the people cost
Be honest about ramp-up and recruiting:
- DynamoDB modeling is non-standard and hard.
- Deep experience is rare and expensive.
- Onboarding takes longer; design reviews are slower.
- Postgres skills are ubiquitous, cheaper, and transferable.
Your org pays this tax every time the team changes.
Reality check: errors, reworks, and painful backfills
Here’s what “small change” really means in DynamoDB.
“We need to move/rename one attribute” (12 million items)
Common scenario: you mis-modeled a field, or product changes semantics. In SQL, you run a migration and maybe a background job. In DynamoDB, you’re looking at a full backfill.
What it takes
- Design: choose between dual-writing, on-read fix-ups, or hard backfill. Plan stream consumers, idempotency, and cutover. Expect days of design/review.
- Code: writer updates, reader compatibility layer, backfill Lambda/Batch job, metrics + alarms, retry semantics.
- Run: backfill across 12M items. Reads + writes per item. Throttle management. Monitoring. Potential hot partitions. Expect hours to days.
- Clean-up: remove old paths, drop temporary GSIs, tidy configs. More PRs, more risk.
- Cost: 12m PutItems operations is $15-$30, which is a lot compared to the promise of the product
- Time: Expect days to weeks as each item has to be Put one by one (BatchItem is just a sugar wrapper on top of PutItem, sorry)
Why teams punt instead Because the math is ugly. A backfill touches every item. You pay request units, you risk throttles, and you burn engineering time. Pragmatic teams often:
- Patch at read time (“if newAttr missing, compute from oldAttr”)
- Dual-store for a while
- Leave both fields forever “just in case”
But that’s debt. It leaks into code paths, tests, analytics, exports, and docs. Six months later nobody remembers which field is truthy.
What a realistic plan looks like (order of magnitude)
- Batch size 500–1,000 items
- Adaptive throttling with exponential backoff
- Idempotent upserts
- Progress checkpoints per partition to allow restarts
- Dual-read in the app during the migration window
- Feature flag for cutover + shadow reads to verify parity
You’ll spend more time building the migration machinery than the feature that prompted it.
“This migration will run for 9 hours—how do we stay up?”
Long backfills are normal. Operating during them is the real work:
- Capacity planning: provision enough throughput to avoid starvation, but don’t starve the app. If on-demand, you still need guardrails to avoid request spikes.
- Write throttling: use token buckets per partition to prevent hot keys; respect adaptive capacity but don’t assume it saves you.
- Read/write isolation: do read-your-writes checks inside the job to avoid flapping values; consider conditional writes to prevent races.
- Idempotency: every batch must be replayable safely.
- Partial failures: checkpoint aggressively; never rely on “last evaluated key” alone; persist progress externally.
- Cutover: run shadow reads comparing old vs new for a sample; flip behind a flag; keep the rollback path warm.
- Observability: emit per-partition metrics, success/error counts, throttle counts, and job liveness. Alert on stall, not just failure.
This is platform engineering work you wouldn’t need with an RDBMS for a simple attribute move.
When DynamoDB is the right call
Use it confidently when the workload is truly key-value at scale:
- IoT ingestion keyed by device/time
- Caching and session stores
- Feature flags and configuration
- Idempotency/dedupe tables
- Offloading cold, simple entities from your RDBMS for cost
In those lanes, DynamoDB is best-in-class. The trap is to be misled by demos and marketing and using it for the wrong need.
A saner default for most apps
Start with Postgres on managed RDS.
It might look pricier than DynamoDB in month one, but your team already speaks SQL, every framework is happy, and the ops overhead is predictable.
Over a year, that familiarity and flexibility usually make it cheaper—and you’ll ship faster, adapt safely, and answer ad-hoc questions without gymnastics.
Once the product settles, peel off true key-value hotspots to DynamoDB if it actually helps.
A word on DSQL
Aurora DSQL, in preview this year, aims to blend DynamoDB’s strengths (cost, operational ease, scale, backups, security) with Postgres ergonomics (relational data, proper sorting, filtering, joins to a degree). On paper, it’s the best of both worlds.
Reality check: as of today (11/2025) it comes with sharp edges—no foreign keys, no JSON columns, no auto-increment sequences, and other early-stage gaps. I’m optimistic about the direction, but cautious: until these limits close, DSQL could become another trap for production apps that need full SQL guarantees.