Faster Database Queries: Practical Techniques

3 min read6 hours ago

–

On Target’s Data Platform team, we tune our job-orchestration platform to stay fast and reliable through seasonal sale peak load.

This article focuses on practical ways to boost database performance, excluding partitioning and sharding topics for brevity.

Foundational Database Decisions

Choosing the right database is a decision with long-term consequence. Its trivial to swap out or migrate out of a database choice during dev phase but difficult in later stages.

Although there is not a single answer that could either fit for your application but every database (PostgreSQL, MongoDB, Redis, Cassandra, etc.) is designed with **core competencies **which they are good at and which they aimed to provide solution for:

PostgreSQL: Strong ACID guarant…

3 min read6 hours ago

–

On Target’s Data Platform team, we tune our job-orchestration platform to stay fast and reliable through seasonal sale peak load.

This article focuses on practical ways to boost database performance, excluding partitioning and sharding topics for brevity.

Foundational Database Decisions

Choosing the right database is a decision with long-term consequence. Its trivial to swap out or migrate out of a database choice during dev phase but difficult in later stages.

PostgreSQL: Strong ACID guarantees, Fit for your relational data.
MongoDB: Flexible schema, good for semi-structured data.
Clickhouse: Complex OLAP workloads, high read throughput.

Press enter or click to view image in full size

Quick Database Comparison

Key Aspects to Consider When Choosing a Database:

Storage engine used: To understand how the data would be stored.
Popularity: A widely used database benefits from active community support, robust tooling and frequent smooth upgrades.
Benchmarking: This would help define Service Level Agreements.
Workload Type: Is your workload OLTP or OLAP ?
Client & Query Characteristics: Number of clients, average record size and read/write throughput expectations.

Using the database features for what they are known for:

Exploiting their native strengths pays off in long-term.

PostgreSQL: Full-text search, JSONB indexing, etc
ClickHouse: MergeTree engine, Materialized views, etc
MongoDB: Aggregation pipelines, TTL indexes, etc

Choosing the relevant indexes & analyzing slow queries

Using PostgreSQL’s EXPLAIN ANALYZE and pg_stat_statements table, or MongoDB’s explain('executionStats'), can help surface the slow queries.

Creating indexes on columns that frequently appear in WHERE, JOIN, or SORT clauses more often helps reduce the number of sequential scans. They improve read performance by reducing full table scans which leads to less I/O and lower CPU time but they degrade writes performance & increase storage consumption.

Helpful PostgreSQL Queries:

-- PgSQL Query to tracks statistics about slow queries executed in the databaseSELECT query,                                            -- normalized query       Round(total_exec_time / 1000.0, 3) AS total_sec,  -- total time (s)       calls,                                            -- executions       Round(( total_exec_time / NULLIF(calls, 0) ) / 1000.0, 3) AS avg_sec,                                              -- avg per call (s)       rows                                              -- rows returnedFROM   pg_stat_statementsWHERE  ( total_exec_time / NULLIF(calls, 0) ) > 100      -- avg > 100 msORDER  BY calls DESC; -- PgSQL Query to tracks number of full table scans executed in the databaseSELECT  relname        AS table_name,             -- Table name (from pg_stat_user_tables)  seq_scan       AS full_table_scans,       -- # of sequential scans started on this table  seq_tup_read   AS rows_read_sequentially, -- # of rows read via sequential scans  idx_scan       AS index_scans,            -- # of index scans started on this table  idx_tup_fetch  AS rows_read_using_index   -- # of table rows fetched by index scansFROM pg_stat_user_tablesORDER BY seq_scan DESC;                     -- Show tables with most sequential scans first

Application Level Performance Optimizations

Implementing Caching Patterns

Helps increase throughput for read-heavy workloads.

Read-Through: App reads from cache, pulls from DB on miss.
Write-Through: Writes propagates to both cache and DB.
Write-Aside: App writes to DB, then updates the cache manually.

Handling Tail Latency

**Distributing **a heavy query into sub-queries that run in parallel over disjoint data sets within the same cluster cuts down on overall latency.

Distributed query execution takes this idea further: a coordinator scatters the sub-queries to the appropriate shards, each shard runs its part in parallel on separate nodes, and the coordinator gathers the partial results and passes it on to the application layer. The application layer then filters and merges the data in memory and passes the final result to the client.

This shrinks response times making performance more consistent for the users.

Distributed query execution

**Trade-offs: **Extra merge logic at the application layer and a slightly higher risk of errors because you’re coordinating many smaller queries.

Batching & Cursor Tuning

Tune query batch size to find the right balance:

Too large: May cause memory pressure, timeouts, retries, network saturation & higher chances of packet drops.
Too small: May lead to excessive round trips. Increasing overall latency of the application.

We deferred sharding because targeted database optimizations met our goals and scaled more simply for thousands of workloads.

Foundational Database Decisions

Foundational Database Decisions

Key Aspects to Consider When Choosing a Database:

Using the database features for what they are known for:

Choosing the relevant indexes & analyzing slow queries

Helpful PostgreSQL Queries:

Application Level Performance Optimizations

Implementing Caching Patterns

Handling Tail Latency

Batching & Cursor Tuning

Similar Posts