A few months ago, I found myself facing one of those architectural decisions you can’t take lightly: Which database should power our application?
On paper, several options looked great. Azure SQL had strong consistency and a familiar relational model. Google Spanner promised global scale and near-infinite horizontal growth. But the same problem kept bothering me:
None of the available benchmarks reflected my application’s workload.
Every benchmark I found was synthetic. Every article was based on someone else’s traffic patterns. Every vendor claim assumed a workload that looked nothing like ours.
And that’s when it clicked.
If I wanted real answers, I needed to simulate our actual read/write behavior, our transaction mix, our concurrency pattern, our connection pooling.
…
A few months ago, I found myself facing one of those architectural decisions you can’t take lightly: Which database should power our application?
On paper, several options looked great. Azure SQL had strong consistency and a familiar relational model. Google Spanner promised global scale and near-infinite horizontal growth. But the same problem kept bothering me:
None of the available benchmarks reflected my application’s workload.
Every benchmark I found was synthetic. Every article was based on someone else’s traffic patterns. Every vendor claim assumed a workload that looked nothing like ours.
And that’s when it clicked.
If I wanted real answers, I needed to simulate our actual read/write behavior, our transaction mix, our concurrency pattern, our connection pooling.
I needed a test setup that behaved like our Java application—not a generic stress tester.This article is the story of how I modeled the workloads before I even touched Azure SQL or Google Spanner.
In Part 2, I’ll share what happened when I finally put those databases under load.
Why I Couldn’t Trust Generic Database Benchmarks
At first, I thought I’d simply compare p95 latencies or look up TPS numbers for each database. But it didn’t take long to realize the flaws:
- These numbers came from engineered environments.
- They assumed ideal network conditions.
- And they definitely didn’t match our schema or access patterns.
Most importantly?
Our system was not a synthetic benchmark.
It had:
- specific read/write ratios
- certain frequently accessed tables
- particular update patterns
- a real connection pool
- and lock-sensitive flows
So I stopped searching for benchmarks and started designing tests that mirrored our workload.
Step 1: Defining What “Good Performance” Actually Meant
Before modeling anything in JMeter, I forced myself to answer a simple question:
What does good database performance look like for this application?
This gave me the baseline metrics and below is the “Performance criteria” I defined:
- p95 and p99 response time targets
- Expected throughput (operations per second)
- Resource utilization boundaries (CPU, memory, IOPS)
- Acceptable error rate
- Scalability expectations under increased concurrency
These weren’t arbitrary numbers—they were tied to real SLAs and user expectations. With this foundation, I could work backwards and ensure the workloads I modeled aligned with the actual business needs.
Step 2: Understanding the Application’s Actual Database Behaviour
Next, I analyzed our application’s schema and data-access patterns.
I listed every operation the application performed frequently:
- Reads (SELECT)
- Writes (INSERT)
- Updates (UPDATE)
- Deletes (DELETE)
But I didn’t stop there. I also identified:
- which tables were read-heavy
- which operations were latency-sensitive
- which queries created row-level locks
- which flows needed strict consistency
This step was crucial because performance testing is not just about “running queries fast.” It’s about understanding how those queries behave under real concurrency.
Step 3: Classifying Operations by Frequency and Importance
One of the biggest mistakes in performance testing is treating all operations equally. Real applications never have a perfectly even CRUD distribution.
Some flows run thousands of times per minute and others just a few. So, I categorized operations based on:
- frequency
- business criticality
- concurrency sensitivity
This gave me clarity on how to weight each operation in the workload.
Step 4: Designing Realistic Workload Scenarios
Now came the core of the modeling process.
I created workload mixes that reflected how our application behaves in production.
Scenario 1: Read-Heavy Workload Most user-facing apps are read-dominant. For ours, this looked like:
70% reads
20% writes
10% updates
Scenario 2: Balanced Workload
For internal workflows and batch processes:
50% reads
30% writes
10% updates
10% deletes
Each scenario represented a realistic slice of our system’s behaviour not a synthetic stress test.
Step 5: Designing the JMeter Setup to Behave Like a Java Application
This part was non-negotiable for me. I didn’t just want JMeter to hit the database, I wanted it to behave like our Java service.
So I built the test plan with care:
Separate thread groups for each CRUD operation
This allowed me to configure:
- individual concurrency control
- precise latency measurement
- different ramp-up patterns
Parameterized SQL queries
Using CSV Data Set Config, every iteration pulled dynamic values. and this avoided caching effects and mimicked real traffic variation.
Shared JDBC connection pool (1:1 ratio)
All thread groups used the same JDBC pool, just like production. This created:
- connection contention
- realistic queueing
- lock waits
Overall, true concurrency behaviour.
At this point, JMeter wasn’t just “sending traffic.” It was simulating the exact way our application interacted with the database.
The Workload Modeling Was Complete — Now the Real Testing Could Begin
After several iterations, reviews, and dry runs, I finally had a performance test suite that felt authentic. It matched:
- our query patterns
- our connection pool settings
- our concurrency behaviour
- our read/write ratios
And most importantly, it reflected the pressure our application would put on any database—whether Azure SQL, Google Spanner, or something else entirely.
Now I was ready for the real showdown.
Coming Up Next: Azure SQL vs Google Spanner — The Actual Results
In Part 2, I’ll share how both databases performed under the exact workloads modeled in this article. And trust me the results were nothing like the vendor benchmarks.
I’ll cover:
- p95/p99 latencies
- read/write throughput
- locking and contention behaviour
- resource consumption
and which database ultimately won for our use case
Stay tuned for Part 2.