19 min read22 hours ago
–
This guide explains how to use microservices architecture for machine learning (ML) applications. We start by looking at the basics of software design, comparing traditional monolithic systems with modern microservices. Next, we explore why microservices are a good fit for ML, highlight the key services that make up a solid ML setup, and look at how these services communicate with each other. Finally, we put everything into practice with a hands-on lab, where you’ll build and test a simple microservices system made of two connected services that show the main ideas in action.
What We Will Learn
- Understand the fundamental differences between monolithic and microservices architectures
- Explore why microservices are particularly beneficial for machi…
19 min read22 hours ago
–
This guide explains how to use microservices architecture for machine learning (ML) applications. We start by looking at the basics of software design, comparing traditional monolithic systems with modern microservices. Next, we explore why microservices are a good fit for ML, highlight the key services that make up a solid ML setup, and look at how these services communicate with each other. Finally, we put everything into practice with a hands-on lab, where you’ll build and test a simple microservices system made of two connected services that show the main ideas in action.
What We Will Learn
- Understand the fundamental differences between monolithic and microservices architectures
- Explore why microservices are particularly beneficial for machine learning applications
- Discover various communication patterns and protocols used between microservices
- Differentiate between stateless and stateful services and their implications
- Gain practical experience by building a simple two-service ML system
- See how Docker integrates with microservices architecture
Part 1 — Microservices Concepts
1) Monolithic vs Microservices Architecture
Monolithic Architecture
Monolithic architecture represents the traditional approach to building applications where all components are tightly integrated into a single, unified codebase. This architecture is characterized by its simplicity in development, testing, and deployment processes in the early stages of an application’s lifecycle.
In a monolithic architecture, all functional components — such as the user interface, business logic, and data access layers — are interconnected and interdependent. When developers need to make changes to any part of the application, they must update and redeploy the entire codebase, even if the change affects only a small portion of functionality.
While this architecture provides advantages in terms of initial development speed and simplicity in debugging (as everything runs in a single process), it presents significant challenges as applications grow in complexity. As the codebase expands, development teams may struggle with:
- Codebase Complexity: As more features are added, the codebase becomes increasingly difficult to understand, maintain, and extend.
- Scaling Limitations: When different components have varying resource requirements, the entire application must be scaled together, leading to inefficient resource utilization.
- Technology Lock-in: The entire application typically uses a single technology stack, making it difficult to adopt new technologies for specific components.
- Deployment Risk: Each deployment involves the entire application, increasing the risk of system-wide failures from localized issues.
- Team Coordination Challenges: Multiple teams working on different features must carefully coordinate their efforts to avoid conflicting changes.
For machine learning applications specifically, monolithic architectures can be particularly problematic due to the diverse nature of ML workflows from data processing to model training to inference each with unique resource requirements and technology preferences.
Architecture Comparison
Press enter or click to view image in full size
Microservice architecture
Press enter or click to view image in full size
Traditional monolithic architecture
Monolithic Architecture:
- Tightly coupled components
- Single codebase
- Single deployment unit
Microservices Architecture:
- Loosely coupled services
- Independent codebases
- Separate deployment units
- Communication via APIs
Monolithic Architecture — Pros and Cons
Pros:
- Simple at the beginning
- Easy local debugging
- Single deployment pipeline
Cons (especially for ML systems):
- Hard to scale specific parts (inference vs training)
- Technology lock-in (one stack for everything)
- Risky deployments (small change redeploys everything)
- Slower teams (coordination & merge conflicts)
- ML workflows are diverse (data ingestion, preprocessing, training, inference all need different resources)
Microservices Architecture
Microservices architecture represents a paradigm shift in application design, decomposing what would traditionally be a monolithic application into a collection of smaller, independent services that work together. Each microservice is responsible for a specific business capability and operates as a separate entity with its own codebase, database (if needed), and deployment pipeline.
This architectural approach is built on several key principles:
- Single Responsibility: Each service focuses on doing one thing well, aligning with specific business domains or capabilities.
- Autonomy: Services can be developed, deployed, and scaled independently without affecting other parts of the system.
- Resilience: Failures in one service are contained and don’t cascade throughout the entire system.
- Technological Diversity: Different services can use different programming languages, frameworks, and databases based on what best suits their specific requirements.
- Decentralized Data Management: Each service can manage its own data storage, using the most appropriate database technology for its needs.
The boundaries between microservices are defined by APIs that enable communication while maintaining loose coupling. This separation creates clear contracts between services and prevents unnecessary dependencies.
Why microservices fit ML:
For machine learning systems, microservices provide particular advantages:
- Specialized Resources: Different phases of ML pipelines (data processing, feature engineering, model training, inference) can receive exactly the resources they need.
- Independent Scaling: Inference services that handle user requests can scale based on traffic patterns, while training services can scale based on model complexity and data volume.
- Technology Optimization: Each service can use the optimal technology stack like TensorFlow for one model, PyTorch for another, specialized hardware for specific algorithms.
- Continuous Deployment: New features or models can be deployed without disrupting the entire system, enabling more rapid experimentation and improvement.
- Observability: Observability can be ML-aware (latency + drift + model performance)
2) Core Services in ML Microservices Architecture
A robust ML microservices platform consists of multiple specialized services, each handling a specific aspect of the machine learning lifecycle. Understanding these core services helps in designing scalable and maintainable ML systems.
Data Ingestion Service
The data ingestion service serves as the entry point for all data flowing into the ML system. It must handle diverse data sources, formats, and volumes while ensuring data integrity and reliability. A robust implementation should process both batch and streaming data with appropriate error handling and retry mechanisms.
Key responsibilities:
- Connecting to external sources (databases, APIs, streams)
- Validating incoming data against schemas
- Publishing events to trigger downstream processing
Preprocessing Service
Raw data rarely arrives in a form suitable for machine learning. The preprocessing service transforms this raw data into a clean, structured format ready for feature extraction and model training. It maintains versioned transformation pipelines to prevent training-serving skew.
Key responsibilities:
- Handling missing values and outliers
- Normalizing and standardizing data
- Applying domain-specific transformations
Feature Store Service
The feature store has emerged as a critical component in modern ML architectures, serving as a centralized repository for features used across multiple models. By centralizing feature computation and storage, it eliminates redundant processing and ensures consistent definitions.
Key responsibilities:
- Storing computed features with metadata
- Ensuring consistency between training and serving
- Enabling feature sharing across teams and models
Model Training Service
This service orchestrates the resource-intensive process of training ML models, from simple regression models to complex neural networks. It integrates with experiment tracking tools to maintain comprehensive records of training runs.
Key responsibilities:
- Executing training jobs on appropriate hardware
- Performing hyperparameter optimization
- Evaluating and registering trained models
Model API Service (Inference / Model API)
The model API service exposes trained models through well-defined interfaces, handling the critical transition from development to production. As the public face of the ML system, it requires special attention to reliability, scalability, and security.
Key responsibilities:
- Implementing prediction endpoints with appropriate interfaces
- Validating inputs and transforming outputs
- Monitoring performance metrics like latency and throughput
Monitoring Service
Continuous monitoring is essential to detect issues early and maintain performance over time. This service tracks both technical metrics and ML-specific concerns like data drift and concept drift.
Key responsibilities:
- Detecting data and concept drift
- Alerting on performance degradation
- Visualizing metrics through dashboards
Additional Services
There are many other services that can enhance an ML microservices platform:
- Metadata Service: Manages metadata about datasets, models, and experiments
- Experiment Tracking Service: Logs and tracks ML experiments
- Model Registry Service: Versioned model artifacts + metadata + approvals
- Data Versioning Service: Tracks different versions of datasets
- A/B Testing Service: Enables experimentation with different model versions
- Feature Engineering Service: Automated feature creation and selection
- Workflow Orchestration Service: Workflow engines (Airflow, Argo Workflows, etc.)
- Model Deployment Service: Handles model deployment and rollback
- Authentication & Authorization Service: Security and access control
- Configuration Management Service: Centralized configuration
- Data Lineage Service: Tracks data flow and transformations
- Model Governance Service: Ensures compliance and model quality
- Notification Service: Alerts and notifications
- Batch Processing Service: Handles large-scale batch jobs
- Continuous Integration/Continuous Deployment (CI/CD) Service: Pipelines for code + model promotion
This Lab’s Focus
This lab focuses on a minimal slice to demonstrate core concepts:
- Service A = gateway/logging/orchestration (represents Data Ingestion + Gateway patterns)
- Service B = inference service (mock model) (represents Model API Service)
graph TB subgraph External["<b>External Data Sources</b>"] DB[("<b>Databases</b>")] API["<b>APIs</b>"] Stream["<b>Data Streams</b>"] Files["<b>File Storage</b>"] end subgraph Ingestion["<b>Data Ingestion Service</b>"] DI["<b>Ingestion API</b>"] DI ==>|Validate Schema| DI DI ==>|Publish Events| Events end subgraph Processing["<b>Data Processing Layer</b>"] Preproc["<b>Preprocessing Service</b>"] FE["<b>Feature Engineering Service</b>"] Preproc ==>|Clean & Transform| Preproc FE ==>|Create Features| FS end subgraph Storage["<b>Storage & Registry Layer</b>"] FS[("<b>Feature Store</b>")] MR["<b>Model Registry Service</b>"] DV["<b>Data Versioning Service</b>"] Meta["<b>Metadata Service</b>"] end subgraph Training["<b>Model Development Layer</b>"] MT["<b>Model Training Service</b>"] ET["<b>Experiment Tracking Service</b>"] MT ==>|Register Models| MR MT ==>|Log Experiments| ET ET ==>|Track Metrics| Meta end subgraph Serving["<b>Model Serving Layer</b>"] Gateway["<b>API Gateway</b>"] MA["<b>Model API Service</b>"] MD["<b>Model Deployment Service</b>"] MA ==>|Load Models| MR MD ==>|Deploy| MA end subgraph Monitoring["<b>Observability Layer</b>"] Monitor["<b>Monitoring Service</b>"] Notify["<b>Notification Service</b>"] Monitor ==>|Alerts| Notify end subgraph Support["<b>Supporting Services</b>"] Auth["<b>Auth & Authorization</b>"] Config["<b>Configuration Management</b>"] DL["<b>Data Lineage Service</b>"] MG["<b>Model Governance Service</b>"] AB["<b>A/B Testing Service</b>"] BP["<b>Batch Processing Service</b>"] end subgraph Orchestration["<b>Orchestration Layer</b>"] WF["<b>Workflow Orchestration</b><br/><b>Airflow/Argo</b>"] CICD["<b>CI/CD Service</b>"] end %% Data Flow - Thick arrows External ==> DI DI ==> Preproc Preproc ==> FS FE ==> FS FS ==> MT MT ==> MR MR ==> MA Gateway ==> MA %% Monitoring connections - Thick arrows MA ==>|Metrics| Monitor MT ==>|Metrics| Monitor Preproc ==>|Metrics| Monitor %% Supporting connections - Thick arrows Auth ==>|Secure| Gateway Auth ==>|Secure| MT Config ==>|Configure| MA Config ==>|Configure| MT DL ==>|Track| Preproc DL ==>|Track| FE MG ==>|Govern| MR AB ==>|Test| MA BP ==>|Process| Preproc %% Orchestration - Thick arrows WF ==>|Orchestrate| MT WF ==>|Orchestrate| Preproc CICD ==>|Deploy| MD %% Metadata connections - Thick arrows Meta ==>|Store Metadata| FS Meta ==>|Store Metadata| MR Meta ==>|Store Metadata| DV style Ingestion fill:#e1f5ff,stroke:#01579b,stroke-width:3px style Processing fill:#f3e5f5,stroke:#4a148c,stroke-width:3px style Storage fill:#fff3e0,stroke:#e65100,stroke-width:3px style Training fill:#e8f5e9,stroke:#1b5e20,stroke-width:3px style Serving fill:#fce4ec,stroke:#880e4f,stroke-width:3px style Monitoring fill:#fff9c4,stroke:#f57f17,stroke-width:3px style Support fill:#f1f8e9,stroke:#33691e,stroke-width:3px style Orchestration fill:#e0f2f1,stroke:#004d40,stroke-width:3p
Press enter or click to view image in full size
Basic ML system
3) Communication Patterns Between Microservices
In a microservices architecture, the way services communicate is as important as the services themselves. Different communication patterns serve different needs, and selecting the appropriate approach impacts system performance, reliability, and maintainability.
Limitations of REST
While REST is versatile, it’s not optimal for all communication patterns:
- Performance overhead from HTTP headers and connection establishment.
- Limited support for bi-directional communication.
- Can be verbose for complex data structures.
- Not ideal for high-frequency, low-latency requirements.
Communication Patterns Comparison
Message queuegRPCREST API
REST (Representational State Transfer)
REST is a widely adopted architectural style for building web services, leveraging standard HTTP methods for communication between services. It is characterized by its stateless nature, where each request from client to server must contain all information needed to understand and process the request.
Implementation Details
REST communications typically use JSON or XML as data formats and rely on standard HTTP methods:
- GET: Retrieve resources without side effects
- POST: Create new resources
- PUT: Update existing resources
- DELETE: Remove resources
Services expose well-defined endpoints that represent resources or actions, following a consistent URL structure.
Example REST API endpoint using FastAPI:
@app.post("/models/{model_id}/predictions")async def create_prediction(model_id: str, data: PredictionRequest): # Retrieve model from registry model = model_registry.get_model(model_id) # Make prediction prediction = model.predict(data.features) # Return prediction result return PredictionResponse( prediction=prediction, model_id=model_id, model_version=model.version, timestamp=datetime.now() )
REST API Key Features:
- Synchronous request-response
- Standard HTTP methods
- Human-readable (JSON/XML)
- Stateless communication
When to Use REST
REST is particularly well-suited for:
- Public-facing APIs that need to be accessible to diverse clients
- Simple request-response interactions where the overhead of HTTP is acceptable
- Services that benefit from HTTP features like caching, content negotiation, and authentication
- Scenarios where human readability and self-documentation are important
REST’s widespread adoption means excellent tooling support, including automatic API documentation (Swagger/OpenAPI), client generation, and testing frameworks.
REST is Best For:
- Public APIs and web services
- Simple CRUD operations
- Human-readable debugging
- Caching and standard HTTP features
Limitations
While REST is versatile, it’s not optimal for all communication patterns:
- Performance overhead from HTTP headers and connection establishment
- Limited support for bi-directional communication
- Can be verbose for complex data structures
- Not ideal for high-frequency, low-latency requirements
This lab uses REST (simplest to learn and test).
gRPC
gRPC is a high-performance RPC framework designed for efficient service-to-service communication. It uses Protocol Buffers (protobuf) as its Interface Definition Language (IDL) and data serialization format, providing strongly typed contracts between services.
Implementation Details
gRPC services are defined using Protocol Buffer files that specify methods and message types:
// Model service definitionservice ModelService { // Get prediction from model rpc Predict(PredictionRequest) returns (PredictionResponse) {} // Stream predictions for multiple inputs rpc PredictStream(stream PredictionRequest) returns (stream PredictionResponse) {}}// Request messagemessage PredictionRequest { string model_id = 1; repeated float features = 2;}// Response messagemessage PredictionResponse { float prediction = 1; float confidence = 2; string model_version = 3;}
From these definitions, gRPC generates client and server code in various languages, handling serialization, deserialization, and network communication.
gRPC Key Features:
- High Performance Binary Protocol
- Bi-directional Streaming
- Strong typing (Protocol Buffers)
- Multi-language support
When to Use gRPC
gRPC excels in scenarios requiring:
- High-performance, low-latency service-to-service communication
- Strong typing and contract enforcement between services
- Polyglot environments with services in multiple programming languages
- Support for streaming data in either direction
- Efficient binary serialization for reduced network overhead
gRPC is particularly valuable for internal service communication in ML systems where performance is critical.
gRPC is Best For:
- High performance internal APIs
- Real-time Streaming data
- Microservices Communication
- Low-Latency requirements
Advanced Features
gRPC offers several advanced capabilities beyond basic RPC:
- Bi-directional streaming for real-time communication
- Built-in load balancing and service discovery integration
- Deadline propagation for timeout management
- Interceptors for cross-cutting concerns like logging and authentication
- Backward compatibility mechanisms for evolving APIs
Message Queues (Kafka, RabbitMQ, Pulsar)
Message-based communication uses intermediate brokers to decouple services, enabling asynchronous interactions where senders and receivers operate independently.
Implementation Details
In a message-based architecture:
- Publishers send messages to topics or queues without knowledge of consumers
- The message broker reliably stores messages and handles delivery
- Consumers process messages at their own pace, with no direct connection to publishers
Kafka producer example — publishing a data ingestion event:
def publish_data_arrival_event(dataset_id, records_count, timestamp): event = { "event_type": "data_ingestion_completed", "dataset_id": dataset_id, "records_count": records_count, "timestamp": timestamp.isoformat(), "source_system": "web_analytics" } # Serialize and publish the event producer.send( topic="data-events", key=dataset_id, value=json.dumps(event).encode('utf-8') ) producer.flush()
Kafka consumer example — preprocessing service:
for message in consumer: event = json.loads(message.value.decode('utf-8')) if event["event_type"] == "data_ingestion_completed": # Trigger preprocessing pipeline preprocessing_pipeline.process_dataset( dataset_id=event["dataset_id"], source=event["source_system"] )
Message Queue Key Features:
- Asynchronous messaging
- Reliable message delivery
- Decoupled services
- Load Balancing and buffering
When to Use Message Queues
Message-based communication is ideal for:
- Decoupling services to enhance resilience and independent scaling
- Handling workload spikes through buffering
- Implementing event-driven architectures where actions are triggered by system events
- Ensuring reliable delivery of messages even when downstream services are temporarily unavailable
- Broadcasting events to multiple consumers simultaneously
In ML systems, message queues often facilitate the flow of data through processing pipelines, with each stage publishing events that trigger subsequent processing.
Message Queues are Best For:
- Event-driven architecture
- Background job processing
- Handling traffic spikes
- Loose coupling services
Types of Message Patterns
Get Faizulkhan’s stories in your inbox
Join Medium for free to get updates from this writer.
Different messaging systems support various communication patterns:
- Publish-Subscribe: Messages are broadcast to all subscribed consumers (e.g., Kafka topics)
- Point-to-Point: Messages are delivered to exactly one consumer from a pool (e.g., RabbitMQ queues)
- Request-Reply: Asynchronous request-response interactions through temporary reply queues
- Dead Letter Queues: Special queues for messages that cannot be processed, enabling retry strategies
Each pattern serves different use cases within a microservices architecture.
4) Stateless vs Stateful Services
The distinction between stateless and stateful services is fundamental to microservices architecture, affecting how services are designed, deployed, and scaled.
Stateless Services
Stateless services do not store client state between requests. Each request contains all the information needed to process it, without relying on server-side session data.
Characteristics of Stateless Services
Independence from Previous Interactions: Each request is processed without knowledge of previous requests. This means that any service instance can handle any request, enabling simple horizontal scaling.
Simplified Recovery: If a stateless service instance fails, requests can be immediately redirected to another instance without data loss or inconsistency.
Deployment Flexibility: New versions can be deployed using strategies like blue-green deployment or rolling updates without complex state migration.
Resource Efficiency: Instances can be added or removed based on demand without concerns about state transfer.
Architecture Comparison
Press enter or click to view image in full size
Stateful servicestateless service
Stateless Benefits:
- Easy horizontal scaling — add/remove instances anytime
- No data loss on failure — any instance can handle any request
- Simple deployment — rolling updates without state migration
Stateful Challenges:
- Complex scaling — state must be migrated or replicated
- Recovery complexity — state must be restored after failures
- Consistency challenges — multiple instances need synchronized state
Scaling Comparison
Examples in ML Systems
Stateless Services:
- Model Inference Services: Services that load a model and make predictions based solely on input data
- Data Transformation Services: Services that apply defined transformations to incoming data
- Validation Services: Services that check data against schemas or rules
- REST APIs
- ML inference
- Image processing
- Authentication
Implementation Considerations
To maintain statelessness while still providing personalized experiences, stateless services often:
- Store state externally in databases or caches
- Pass state information in request parameters or headers
- Use token-based authentication instead of session cookies
- Employ idempotent operations that can be safely retried
Stateless model inference service example:
@app.post("/predict")async def predict(request: PredictionRequest): # Load model based on request parameter model = model_registry.get_model( model_id=request.model_id, version=request.model_version ) # Process request using only the provided data prediction = model.predict(request.features) # Return response without storing any client state return { "prediction": prediction, "model_id": request.model_id, "request_id": generate_uuid(), "timestamp": datetime.now().isoformat() }
Stateful Service
Stateful services maintain client state between requests, remembering information from previous interactions or maintaining internal state critical to their operation.
Characteristics of Stateful Services
Persistent State: These services maintain data that persists beyond individual requests, either in memory or persistent storage directly tied to the service.
Complex Scaling: Adding or removing instances requires careful state management, often involving data replication, sharding, or migration.
State Consistency Challenges: When multiple instances exist, ensuring all instances have a consistent view of the state becomes critical.
Recovery Complexity: After failures, the service must recover its state before resuming normal operation.
Examples in ML Systems
Stateful Services:
- Feature Stores: Maintain feature values and metadata across requests
- Model Registry Services: Track model versions, artifacts, and deployment status
- Session-Based Recommendation Services: Maintain user session context to provide contextual recommendations
- Online Learning Services: Update model parameters based on streaming data
- Databases
- Session storage
- Shopping Carts
- Real-time Gaming
Implementation Considerations
Stateful services require special attention to:
- State persistence and durability
- Replication strategies for high availability
- Consistency models and potential trade-offs
- Backup and recovery procedures
- State migration during upgrades
Stateful feature store service example:
class FeatureStoreService: def __init__(self, storage_engine): self.storage = storage_engine self.cache = LRUCache(max_size=10000) def get_feature_vector(self, entity_id, feature_names, timestamp=None): # Check cache first cache_key = f"{entity_id}:{','.join(feature_names)}:{timestamp}" if cache_key in self.cache: return self.cache[cache_key] # Retrieve from persistent storage features = self.storage.get_features( entity_id=entity_id, feature_names=feature_names, timestamp=timestamp ) # Update cache for future requests self.cache[cache_key] = features return features def update_feature(self, entity_id, feature_name, value, timestamp): # Update persistent storage self.storage.store_feature( entity_id=entity_id, feature_name=feature_name, value=value, timestamp=timestamp ) # Invalidate relevant cache entries self.cache.invalidate_pattern(f"{entity_id}:*")
Hybrid Approaches
Many modern ML systems adopt hybrid approaches:
- Core business logic in stateless services for scalability
- State externalized to specialized stateful services
- Caching layers to improve performance while maintaining scalability
- Event sourcing patterns to reconstruct state when needed
This lab’s services are stateless (aside from logs / in-memory model object).
5) Docker and Microservices
Docker containers provide an ideal deployment mechanism for microservices, encapsulating each service with its dependencies in a lightweight, portable format. This containerization approach offers numerous benefits for ML microservices specifically:
- Consistent Environments: Eliminates “it works on my machine” problems by packaging all dependencies
- Resource Isolation: Prevents conflicts between services with different dependency requirements
- Efficient Resource Usage: Allows multiple containers to share the same host OS kernel
- Rapid Deployment: Enables quick startup and shutdown of services
- Portability: Runs consistently across development, testing, and production environments
In production, microservices are commonly shipped as containers:
- each service → its own image
- dependencies isolated
- portable across dev/stage/prod
- health checks + resource limits
Dockerfile Example
A typical Dockerfile for an ML microservice might include:
- Base image with appropriate ML frameworks
- System dependencies for numerical processing
- Python packages for the specific service
- Service code and configuration
- Health check endpoints
- Environment-specific settings via environment variables
Example Dockerfile for an ML inference service:
FROM python:3.9-slim# Install system dependenciesRUN apt-get update && apt-get install -y \ libgomp1 \ && rm -rf /var/lib/apt/lists/*# Set working directoryWORKDIR /app# Copy requirements and install dependenciesCOPY requirements.txt .RUN pip install --no-cache-dir -r requirements.txt# Copy model and service codeCOPY models/ ./models/COPY service/ ./service/# Set environment variablesENV MODEL_PATH=/app/models/xgboost_v3.pklENV LOG_LEVEL=INFOENV MAX_WORKERS=4# Expose the service portEXPOSE 8080# Health checkHEALTHCHECK --interval=30s --timeout=3s \ CMD curl -f http://localhost:8080/health || exit 1# Run the serviceCMD ["python", "service/main.py"]
This lab supports both approaches:
- Docker (recommended): Full containerization with docker-compose
- venv (local dev): Virtual environments for local development
Part 2 — The Actual Lab Implementation (2 Services)
This repository contains a complete, runnable two-microservice lab for a simple machine-learning-style system:
- Service A (Input Logger / Gateway): Receives client requests, logs inputs, and optionally forwards the request to the ML service.
- Service B (ML Predictor): A dedicated service that returns a mock “prediction” (random class + confidence) for demo purposes.
The project is designed to teach both:
- Microservices theory (why/how), and
- The practical workflow of running two independent services that communicate over HTTP.
Repo structure
ml_microservices_lab/├── service_a/│ ├── main.py│ ├── requirements.txt│ └── Dockerfile├── service_b/│ ├── main.py│ ├── requirements.txt│ └── Dockerfile├── docker-compose.yml├── .dockerignore├── .gitignore└── README.md
High-level flow
- Client sends request to Service A:
POST [http://localhost:8000/process](http://localhost:8000/process)- body:
{ "data": "...", "forward_to_model": true/false }
2. Service A logs the input and:
- If
forward_to_model=false: returns logging status only - If
forward_to_model=true: calls Service BPOST [http://localhost:8001/predict](http://localhost:8001/predict)
- Service B returns a mock prediction (random class + confidence)
- Service A returns a combined response.
Communication Flow Diagram
Press enter or click to view image in full size
Part 3 — Setup & Run
Option A: Docker Setup (Recommended)
Docker provides containerized, isolated environments for both services. This is the recommended approach for consistency across different operating systems.
Prerequisites
- Docker installed
- Docker Compose installed
Quick Start with Docker
- Build and start both services:
docker-compose up --build
This will:
- Build Docker images for both services
- Start Service B on port 8001
- Start Service A on port 8000 (after Service B is healthy)
- Create a Docker network for service communication
2. Run in detached mode (background):
docker-compose up -d --build
3. View logs:
# All servicesdocker-compose logs -f# Specific servicedocker-compose logs -f service_adocker-compose logs -f service_b
4. Stop services:
docker-compose down
5. Rebuild after code changes:
docker-compose up --build
Docker Health Checks
Both services include health checks. Service A waits for Service B to be healthy before starting.
Check service status:
docker-compose ps
Docker Networking
Services communicate via Docker’s internal network:
- Service A connects to Service B using:
[http://service\_b:8001/predict](http://service_b:8001/predict) - From your host machine, use:
http://localhost:8000and[http://localhost:8001](http://localhost:8001)
Docker Troubleshooting
View container logs:
docker-compose logs service_adocker-compose logs service_b
Option B: Local Setup (Ubuntu/Debian safe way)
If you are on Ubuntu/Debian with Python 3.12+, you may see: error: externally-managed-environment (PEP 668). The fix is to use a virtual environment (venv). Do not install packages system-wide.
0) Prerequisites
sudo apt updatesudo apt install -y python3-full python3-venv
1) Run Service B (Terminal 1)
cd service_bpython3 -m venv venvsource venv/bin/activatepip install -r requirements.txtpython main.py
Service B will run on: [http://localhost:8001](http://localhost:8001)
Health check:
curl http://localhost:8001/health
2) Run Service A (Terminal 2)
cd service_apython3 -m venv venvsource venv/bin/activatepip install -r requirements.txtpython main.py
Service A will run on: [http://localhost:8000](http://localhost:8000)
Health check:
curl http://localhost:8000/health
Part 4 — Testing the Services
Method 1: Using curl (Command Line)
1) Service A without forwarding (only logging)
curl -X POST -H "Content-Type: application/json" \ -d '{"data":"sample input","forward_to_model":false}' \ http://localhost:8000/process
Expected response:
{"status":"Input logged successfully"}
2) Service A with forwarding (A → B)
curl -X POST -H "Content-Type: application/json" \ -d '{"data":"cat image data","forward_to_model":true}' \ http://localhost:8000/process
Expected response:
{ "status": "Input logged successfully", "model_prediction": { "prediction": { "class": "dog", "confidence": 0.92, "input_length": 13 }, "message": "Predicted class: dog with 92.0% confidence" }}
3) Call Service B directly
curl -X POST -H "Content-Type: application/json" \ -d '{"input":"test input"}' \ http://localhost:8001/predict
Expected response:
{ "prediction": { "class": "bird", "confidence": 0.87, "input_length": 10 }, "message": "Predicted class: bird with 87.0% confidence"}
4) Health check endpoints
# Service A health checkcurl http://localhost:8000/health# Service B health checkcurl http://localhost:8001/health# Service A root endpointcurl http://localhost:8000/# Service B root endpointcurl http://localhost:8001/
Method 2: Using Postman
Setup
- Install Postman from postman.com
- Create a new Collection named “ML Microservices Lab”
Test Cases
Test 1: Service A — Process without forwarding
- Method:
POST - URL:
[http://localhost:8000/process](http://localhost:8000/process) - Headers:
Content-Type: application/json
- Body (raw JSON):
{ "data": "sample input", "forward_to_model": false}
- Expected Status:
200 OK - Expected Response:
{ "status": "Input logged successfully"}
Test 2: Service A — Process with forwarding
- Method:
POST - URL:
[http://localhost:8000/process](http://localhost:8000/process) - Headers:
Content-Type: application/json
- Body (raw JSON):
{ "data": "cat image data", "forward_to_model": true}
- Expected Status:
200 OK - Expected Response: Contains
statusandmodel_predictionfields
Test 3: Service B — Direct prediction
- Method:
POST - URL:
[http://localhost:8001/predict](http://localhost:8001/predict) - Headers:
Content-Type: application/json
- Body (raw JSON):
{ "input": "test input"}
- Expected Status:
200 OK - Expected Response: Contains
predictionandmessagefields
Test 4: Health Checks
- Service A Health:
- Method:
GET - URL:
[http://localhost:8000/health](http://localhost:8000/health) - Service B Health:
Method: GET
URL: [http://localhost:8001/health](http://localhost:8001/health)
Part 5— What to improve next (real production direction)
This lab intentionally uses simple choices. In real systems, we will typically add:
- Service discovery: DNS/Consul/etcd instead of hard-coded URLs
- Resilience patterns: retries, timeouts, circuit breakers, fallbacks
- Async messaging: Kafka/RabbitMQ/Pulsar to decouple pipelines
- Observability: OpenTelemetry traces, metrics, structured logs
- Security: authn/authz, mTLS, rate limiting
Git link: https://github.com/faizulkhan56/ml_microservices_lab
Conclusion
In this lab, we’ve examined the key differences between traditional monolithic architectures and modern microservices approaches, understanding how the latter provides benefits like independent scaling, technology flexibility, fault isolation, and team specialization. We’ve learned about the core services that comprise a robust ML system, including data ingestion, preprocessing, feature store, model training, model API, and monitoring services, along with many other specialized services that can enhance an ML pipeline. We’ve also investigated various communication patterns between microservices, including REST APIs, gRPC, and message queues, and understood the critical distinction between stateless and stateful services. Additionally, we’ve seen how Docker containers provide an ideal deployment mechanism for microservices by encapsulating each service with its dependencies in a lightweight, portable format. This foundation of knowledge prepares us to implement a practical microservices system that demonstrates these principles in action.