Building Production-Grade Micro Services on Azure Kubernetes

5 min read10 hours ago

–

Architecture and Cost Trade-offs

How we designed a scalable, reliable microservices platform on Azure Kubernetes Service while significantly reducing infrastructure costs-without sacrificing observability or security.

Press enter or click to view image in full size

Introduction

When building cloud-native systems, teams often face a familiar dilemma:

Do we optimize for reliability or for cost?

Fully managed cloud services promise simplicity, strong SLAs, and fewer operational headaches. The trade-off is cost-often far higher than expected once a system grows beyond a handful of services. On the other hand, self-hosting everything can reduce spend but increases operational complexity and risk.

This post shares how we approached this problem …

5 min read10 hours ago

–

Architecture and Cost Trade-offs

How we designed a scalable, reliable microservices platform on Azure Kubernetes Service while significantly reducing infrastructure costs-without sacrificing observability or security.

Press enter or click to view image in full size

Introduction

When building cloud-native systems, teams often face a familiar dilemma:

Do we optimize for reliability or for cost?

This post shares how we approached this problem while building a production-grade microservices platform on Azure Kubernetes Service (AKS). Instead of choosing one extreme, we adopted a hybrid architecture -using managed services where they truly matter, and self-hosting components where the risk-to-cost trade-off made sense.

The result was a platform that is:

Cost-efficient (70–90% lower than a fully managed stack)
Reliable for critical workloads
Observable without per-GB ingestion costs
Portable and fully defined as code

This article focuses on architecture and decision-making, not step-by-step implementation. A follow-up post will cover the Terraform and Kubernetes details.

The Core Problem: Cost, Reliability, and Complexity

Modern microservices platforms are more than application code. Even small systems require:

Databases and message queues
Caching layers
Metrics, logs, and dashboards
Secure networking and access control

Each component introduces the same question:

Should this be a managed service or something we run ourselves?

The Cost Reality

Managed services are excellent-but their pricing compounds quickly.

A typical micro services setup can look like this:

| Component  | Managed Service       | Typical Monthly Cost || ---------- | --------------------- | -------------------- || PostgreSQL | Azure PostgreSQL      | $65–200              || Redis      | Azure Cache for Redis | $50–150              || RabbitMQ   | Managed broker        | $100–300             || Logging    | Azure Monitor         | $2.50 per GB         |

Individually, these costs are reasonable. Together, they add up fast-especially for early-stage or cost-sensitive workloads.

The Key Insight: Not All State Is Equal

The most important architectural decision we made was to classify state.

Two Types of State

Irreplaceable state Regenerable state

Once you make this distinction, the managed vs self-hosted decision becomes much clearer.

Our Hybrid Architecture Strategy

We deliberately mixed managed and self-hosted components.

What We Ran as Managed Services

PostgreSQL (database of record) Chosen for durability, point-in-time recovery, backups, and SLA guarantees.

What We Ran Inside Kubernetes

Redis (cache)
RabbitMQ (message broker)
Observability stack (metrics and logs)

These components are important, but failure is recoverable. Kubernetes handles orchestration, restarts, and rescheduling, making this a reasonable trade-off.

This single decision accounted for most of the cost reduction-without increasing the blast radius of real failures.

Kubernetes-First, but Not Kubernetes-Everything

Kubernetes was chosen as the orchestration layer, not as a place to host everything indiscriminately.

Why Kubernetes Works Well Here

Consistent deployment model
Horizontal scaling built-in
Strong ecosystem
Portability across clouds

Where Kubernetes Is Not Ideal

Primary databases
Highly stateful systems requiring strong consistency guarantees

Using Kubernetes for compute and managed services for critical state gives the best of both worlds.

Scaling Without Paying for Idle Capacity

One of the easiest ways to waste money in the cloud is provisioning for peak load.

The Approach

At idle, the platform runs on a small footprint. During traffic spikes, it scales automatically-then scales back down.

This approach dramatically reduced monthly costs without affecting availability.

Using Spot Instances-Safely

Not all workloads need guaranteed uptime.

Background workers, batch jobs, and asynchronous processing can tolerate interruptions. These workloads ran on spot instances, trading availability guarantees for steep discounts.

When Spot Instances Make Sense

Background processing
Data pipelines
Non-user-facing jobs

When They Don’t

APIs
Databases
Stateful services

By isolating these workloads, we reduced compute costs significantly without impacting user experience.

Observability Without Per-GB Pricing

Observability is not optional-but managed logging platforms charge heavily at scale.

Instead of paying per-GB ingestion fees, we used a self-hosted stack:

Metrics collected and stored in-cluster
Logs stored in object storage using low-cost tiers
Dashboards built on top of open-source tooling

Why This Works

Logs are queried infrequently
Storage is cheap
Ingestion costs dominate managed observability pricing

This approach reduced observability costs by orders of magnitude while preserving full visibility into the system.

Security and Access: Simple, Auditable, and Cheap

Security was designed around a few principles:

Private by Default

Databases are accessible only within the virtual network
No public endpoints for internal services

Identity Over Secrets

Workloads authenticate using cloud identity
No long-lived credentials stored in Kubernetes

Least Privilege

Day-to-day operations require minimal permissions
Elevated access is limited to initial setup

This keeps the system secure without introducing VPNs, bastion hosts, or unnecessary operational overhead.

Cost Snapshot (Production)

A representative production setup looked roughly like this:

| Component              | Monthly Cost || ---------------------- | ------------ || AKS compute (baseline) | $80–100      || Managed PostgreSQL     | $65          || Storage and networking | $30          || Persistent volumes     | $10          || **Total**              | **~$190**    |

Comparable fully managed setups often exceeded $500-$1,500 per month.

Lessons Learned

1. Don’t Optimise the Wrong Layer

Saving money on your database of record is rarely worth the risk.

2. Spot Instances Are High-Leverage

Used correctly, they offer some of the highest cost savings available.

3. Observability Is a Requirement

Skipping it to save money always costs more later.

4. Infrastructure as Code Pays Off Early

Teams that automate early spend less time firefighting later.

5. Kubernetes Is an Enabler, Not the Goal

Use it where it adds leverage-not as a default for everything.

Conclusion

Production-grade systems don’t require premium managed services at every layer.

By:

Classifying state correctly
Using managed services selectively
Scaling dynamically
Leveraging open-source observability

it’s possible to build platforms that are cost-efficient, reliable, secure, and portable -even with small teams.

In Part 2, we’ll dive into the actual implementation: Terraform modules, AKS autoscaling, spot node pools, Workload Identity, and Kubernetes deployment patterns.

Architecture and Cost Trade-offs

Introduction

Architecture and Cost Trade-offs

Introduction

The Core Problem: Cost, Reliability, and Complexity

The Cost Reality

The Key Insight: Not All State Is Equal

Two Types of State

Our Hybrid Architecture Strategy

What We Ran as Managed Services

What We Ran Inside Kubernetes

Kubernetes-First, but Not Kubernetes-Everything

Why Kubernetes Works Well Here

Where Kubernetes Is Not Ideal

Scaling Without Paying for Idle Capacity

The Approach

Using Spot Instances-Safely

When Spot Instances Make Sense

When They Don’t

Observability Without Per-GB Pricing

Why This Works

Security and Access: Simple, Auditable, and Cheap

Private by Default

Identity Over Secrets

Least Privilege

Cost Snapshot (Production)

Lessons Learned

1. Don’t Optimise the Wrong Layer

2. Spot Instances Are High-Leverage

3. Observability Is a Requirement

4. Infrastructure as Code Pays Off Early

5. Kubernetes Is an Enabler, Not the Goal

Conclusion

Similar Posts